Skip to content

Conversation

Sithaarth24
Copy link
Contributor

@Sithaarth24 Sithaarth24 commented Sep 24, 2025

Summary by CodeRabbit

  • New Features

    • Admin UI: choose Authentication Type (delegated vs app-only) and enter Tenant ID for Microsoft integrations; distinct success messages and integration status per type.
    • Service-account (app-only) SharePoint ingestion with background sync and incremental delta processing.
  • Improvements

    • OAuth flows unified to use a default Microsoft scope and improved token refresh/retry across modes.
    • Per-drive delta tokens and enhanced sync progress reporting.
  • Chores

    • Added Azure Identity dependency.

@coderabbitai
Copy link
Contributor

coderabbitai bot commented Sep 24, 2025

Note

Other AI code review bot(s) detected

CodeRabbit has detected other AI code review bot(s) in this pull request and will avoid duplicating their findings in the review comments. This may lead to a less comprehensive review.

Walkthrough

Adds Microsoft delegated and app‑only (service‑account) OAuth flows: frontend authType branching and model changes; new admin API to register service credentials; DB connector token-refresh helpers; Microsoft Graph client and helpers; SharePoint discovery, delta ingestion/sync; retry/auth plumbing, types, queue wiring, and Azure dependency.

Changes

Cohort / File(s) Summary
Frontend admin Microsoft auth
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx
Adds authType (delegated
Admin API & server routing
server/api/admin.ts, server/server.ts
Adds AddServiceConnectionMicrosoft handler and POST endpoints (/microsoft/service_account, /admin/microsoft/service_account) validated by microsoftServiceSchema; authenticates via CustomServiceAuthProvider, stores encrypted creds, creates ServiceAccount connector, and triggers ingestion.
DB connector & token refresh
server/db/connector.ts
Adds getMicrosoftAuthConnectorWithCredentials; renames IsTokenExpiredIsExpired, refreshes service-account tokens via CustomServiceAuthProvider, updates stored credentials, and returns connector with refreshed tokens.
Microsoft Graph client utilities
server/integrations/microsoft/client.ts
MicrosoftGraphClient now includes optional tenantId and refreshToken, removes authProvider; adds updateMicrosoftGraphClient; updates createMicrosoftGraphClient signature and graph helpers (makeGraphApiCall, downloadFileFromGraph, paged/beta variants) to use the new client shape.
Microsoft config / scopes
server/integrations/microsoft/config.ts
Replaces granular scopes array with ["https://graph.microsoft.com/.default","offline_access"].
Service-account ingestion & SharePoint
server/integrations/microsoft/index.ts, server/integrations/microsoft/sharepoint/index.ts, server/integrations/microsoft/sync.ts
Adds handleMicrosoftServiceAccountIngestion and handleMicrosoftServiceAccountChanges; new SharePoint discovery (discoverSharePointSites, discoverSiteDrives) and processSiteDrives; delta-based per-drive processing, file change/delete handlers, delta token storage, and per-drive ingestion logic; removes some legacy public file-processing exports from index.ts.
Microsoft helpers & content processing
server/integrations/microsoft/utils.ts
Adds downloadDir, deleteDocument, safeLoadPDF, CustomServiceAuthProvider, permission helpers, processMicrosoftPDFs/processMicrosoftWord/processMicrosoftExcel, processFileContent, logSyncProgress, and delta-init helpers.
Queue integration
server/queue/index.ts
Sync worker now invokes handleMicrosoftServiceAccountChanges() alongside Google service-account sync.
Types & schemas
server/types.ts
Adds microsoftServiceSchema/microsoftService, MicrosoftServiceCredentials type, MicrosoftSharepointDeltaTokenSchema and includes it in the ChangeToken union.
Retry / auth utilities
server/utils.ts
retryWithBackoff now accepts `OAuth2Client
Deps & package
server/package.json
Adds dependency @azure/identity and minor devDependency formatting adjustments.

Sequence Diagram(s)

sequenceDiagram
  autonumber
  actor Admin as Admin User
  participant FE as Frontend (Microsoft Tab)
  participant API as Server API
  participant Auth as CustomServiceAuthProvider
  participant DB as DB (connectors)
  participant Ing as Ingestion Worker

  Admin->>FE: choose authType=appOnly, enter tenantId+creds, submit
  FE->>API: POST /admin/microsoft/service_account {tenantId, clientId, clientSecret}
  API->>Auth: getAccessTokenWithExpiry()
  Auth-->>API: {access_token, expires_at}
  API->>DB: create connector (ServiceAccount) + store encrypted creds
  API->>Ing: trigger handleMicrosoftServiceAccountIngestion (async)
  API-->>FE: 200 {connectorId, expires_at}
  FE->>FE: setOAuthIntegrationStatus(OAuthConnecting)
Loading
sequenceDiagram
  autonumber
  participant Q as Sync Queue
  participant MS as handleMicrosoftServiceAccountChanges
  participant DB as DB (connectors + tokens)
  participant GC as MicrosoftGraphClient
  participant SP as Microsoft Graph / SharePoint
  participant Vespa as Index

  Q->>MS: run()
  MS->>DB: load service-account connectors + creds
  MS->>GC: create/update graph client (tenantId, creds)
  MS->>SP: discover sites & drives
  loop per drive
    MS->>SP: GET delta changes
    alt deleted
      MS->>Vespa: delete document
    else added/updated
      MS->>SP: download file & permissions
      MS->>Vespa: upsert document/chunks/ACLs
    end
    MS->>DB: persist deltaLinks + lastSyncedAt
  end
Loading

Estimated code review effort

🎯 5 (Critical) | ⏱️ ~120+ minutes

Possibly related PRs

Suggested reviewers

  • zereraz
  • shivamashtikar
  • kalpadhwaryu
  • junaid-shirur

Poem

I’m a rabbit with a service key,
I hop through sites and delta trees,
I fetch the drives and stitch the streams,
I queue the sync and index dreams,
Tokens snug — all systems green. 🐇✨

Pre-merge checks and finishing touches

✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title Check ✅ Passed The title concisely and accurately describes the primary feature introduced by this pull request, namely the Microsoft SharePoint document ingestion and synchronization flows, without extraneous detail or noise.
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
✨ Finishing touches
  • 📝 Generate Docstrings
🧪 Generate unit tests
  • Create PR with unit tests
  • Post copyable unit tests in a comment
  • Commit unit tests in branch feat/microsoft-sharepoint

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @Sithaarth24, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly expands Microsoft integration capabilities by introducing comprehensive ingestion and synchronization for SharePoint documents. It provides flexibility in authentication through both delegated OAuth and app-only service accounts, complete with a new frontend configuration interface. The backend now manages service account credentials, performs initial data ingestion from SharePoint sites and drives, and maintains an up-to-date index using a delta synchronization process. This ensures that SharePoint content is discoverable and current within the system.

Highlights

  • Microsoft SharePoint Integration: Introduced full ingestion and synchronization capabilities for Microsoft SharePoint documents, allowing the system to connect to SharePoint sites and drives.
  • Flexible Authentication Methods: Added support for two authentication types for Microsoft integrations: 'Delegated' (user-based OAuth) and 'App-only' (service account with Client ID, Client Secret, and Tenant ID). The frontend now provides a UI to select between these options.
  • Service Account Management: Implemented new backend endpoints and logic to handle the creation, validation, and token refreshing for Microsoft service account connectors, ensuring secure and continuous access to SharePoint data.
  • Delta Synchronization for SharePoint: Developed a robust delta synchronization mechanism for SharePoint, enabling efficient tracking of changes (additions, updates, deletions) across sites and drives, and updating the search index accordingly.
  • Refactored Microsoft Graph Client: Enhanced the Microsoft Graph client and utility functions to seamlessly support both delegated and service account authentication flows, including dynamic token refreshing for both types.
  • Dependency Updates: Updated project dependencies, including the addition of @azure/identity for Azure authentication and an upgrade to @xyne/vespa-ts.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a significant new feature for Microsoft SharePoint document ingestion and synchronization using service account authentication. The changes are extensive, touching the frontend UI for configuration, backend API endpoints, database schemas, and the core integration logic. Overall, the implementation is well-structured, with clear separation of concerns for SharePoint site/drive discovery and file processing. However, I've identified a couple of critical issues related to token persistence and leftover test code that must be addressed before merging. I've also included some medium-severity suggestions to improve type safety and correct minor logic errors.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 13

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
server/integrations/microsoft/index.ts (1)

1082-1089: Undefined downloadDir variable.

The downloadDir variable is used but not defined anywhere in the file.

Either define downloadDir or remove this development-only code:

-if (process.env.NODE_ENV !== "production") {
-  const init = () => {
-    if (!fs.existsSync(downloadDir)) {
-      fs.mkdirSync(downloadDir, { recursive: true })
-    }
-  }
-  init()
-}

If this is needed for development, define the constant:

+const downloadDir = path.join(__dirname, 'downloads')
+
 if (process.env.NODE_ENV !== "production") {
   const init = () => {
     if (!fs.existsSync(downloadDir)) {
       fs.mkdirSync(downloadDir, { recursive: true })
     }
   }
   init()
 }
server/integrations/microsoft/client.ts (5)

22-37: Fix stale token exposure in getCurrentTokens (captures initial values).

getCurrentTokens() closes over the constructor args, so it won’t reflect tokens updated via updateMicrosoftGraphClient. Also, there’s no place to store an updated expiry.

Apply this diff to add a mutable expiresAt field on the client shape:

 export interface MicrosoftGraphClient {
   client: Client
   accessToken: string
   refreshToken?: string // Only for delegated clients
   tenantId?: string // Only for service clients
   clientId: string
   clientSecret: string
   betaClient: Client
+  expiresAt?: Date
   // Helper methods to get updated tokens after refresh
   getCurrentTokens(): {
     accessToken: string
     refreshToken?: string
     expiresAt?: Date
   }
 }

Then update the factory and updater (see comments on Lines 39-57 and 79-94).


59-96: Ensure getCurrentTokens reflects current client state.

Return current fields instead of closed-over variables; also surface expiresAt on the object.

Apply this diff:

 export const createMicrosoftGraphClient = (
   accessToken: string,
   clientId: string,
   clientSecret: string,
   refreshToken?: string,
   tenantId?: string,
   tokenExpiresAt?: Date,
 ): MicrosoftGraphClient => {
   const authProvider = new CustomAuthProvider(accessToken)

   const client = Client.initWithMiddleware({
     authProvider,
     defaultVersion: "v1.0",
   })
   const betaClient = Client.initWithMiddleware({
     authProvider,
     defaultVersion: "beta",
   })

   return {
     client,
     accessToken,
     refreshToken,
     tenantId,
     clientId,
     clientSecret,
     betaClient,
+    expiresAt: tokenExpiresAt,
     getCurrentTokens() {
       return {
-        accessToken,
-        refreshToken,
-        expiresAt: tokenExpiresAt,
+        accessToken: this.accessToken,
+        refreshToken: this.refreshToken,
+        expiresAt: this.expiresAt,
       }
     },
   }
 }

113-121: Graph SDK get() does not take arguments; drop options from Beta call.

Passing options to .get(options) is not supported by @microsoft/microsoft-graph-client.

Apply this diff:

 export const makeBetaGraphApiCall = async (
   graphClient: MicrosoftGraphClient,
   endpoint: string,
-  options?: any,
 ): Promise<any> => {
   return retryWithBackoff(
     async () => {
-      const result = await graphClient.betaClient.api(endpoint).get(options)
+      const result = await graphClient.betaClient.api(endpoint).get()
       return result
     },
     `Making Microsoft Graph API call to ${endpoint}`,
     Apps.MicrosoftDrive,
     1,
     graphClient,
   )
 }

Also applies to: 123-127


129-152: get() doesn’t accept args; also options is unused.

Use headers via .header() as you do, and call .get() without parameters. Consider removing the options param from the signature.

Apply this diff:

 export const makeGraphApiCallWithHeaders = async (
   graphClient: MicrosoftGraphClient,
   endpoint: string,
   headers: Record<string, string>,
-  options?: any,
 ): Promise<any> => {
   return retryWithBackoff(
     async () => {
       const request = graphClient.client.api(endpoint)

       // Add custom headers
       Object.entries(headers).forEach(([key, value]) => {
         request.header(key, value)
       })

-      const result = await request.get(options)
+      const result = await request.get()
       return result
     },
     `Making Microsoft Graph API call to ${endpoint} with headers`,
     Apps.MicrosoftDrive,
     1,
     graphClient,
   )
 }

154-181: Use full nextLink URL directly; avoid re-parsing the path (version duplication risk).

Feeding url.pathname + url.search to .api() can duplicate /v1.0//beta or drop host-specific routing.

Apply this diff:

   while (nextLink) {
     const response: any = await retryWithBackoff(
       async () => {
-        if (nextLink!.startsWith("http")) {
-          // This is a full URL from @odata.nextLink
-          const url = new URL(nextLink!)
-          const path = url.pathname + url.search
-          return await graphClient.client.api(path).get()
-        } else {
-          // This is a relative path
-          return await graphClient.client.api(nextLink!).get(options)
-        }
+        if (nextLink!.startsWith("http")) {
+          // Use absolute nextLink as-is
+          return await graphClient.client.api(nextLink!).get()
+        }
+        // Relative path
+        return await graphClient.client.api(nextLink!).get(options)
       },
       `Making paginated Microsoft Graph API call to ${nextLink}`,
       Apps.MicrosoftDrive,
       1,
       graphClient,
     )
🧹 Nitpick comments (21)
package.json (1)

2-4: Avoid duplicate deps at repo root; keep runtime deps scoped to server package

Having @azure/identity at the monorepo root while also adding it under server/package.json risks version drift/hoisting quirks. Prefer declaring runtime deps only in the server package (or set up proper workspaces).

Suggested change:

-{
-  "dependencies": {
-    "@azure/identity": "^4.12.0"
-  }
-}
+{
+  "private": true
+}

If you’re using workspaces, add a root "workspaces" key and let the server package own this dependency.

server/db/connector.ts (3)

39-41: Remove test-only/unnecessary imports from a runtime DB module

Importing from node:test (todo) and zod’s date symbol here is unnecessary and can cause bundling/runtime surprises.

Apply this diff:

 import { CustomServiceAuthProvider } from "@/integrations/microsoft/utils"
-import { date } from "zod"
-import { todo } from "node:test"

322-326: Error type/message mismatch for service-account connectors

Throwing NoOauthConnectorFound for a ServiceAccount connector is confusing and can mislead ops/alerts.

  • Introduce a NoServiceAccountConnectorFound error (or generalize the name), and use a message that mentions ServiceAccount.

358-365: Defensive logging is fine

The unsupported app branch is OK; consider reusing the same error wording in both OAuth and ServiceAccount paths for consistency.

frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (1)

48-49: Remove debug log

Stray console.log in submit path.

-  console.log(isServiceAuth)
server/queue/index.ts (1)

218-225: Run service-account change handlers in parallel to avoid head-of-line blocking

Currently executed sequentially; if one integration is slow it delays the other.

Apply this change:

-      await handleGoogleServiceAccountChanges(boss, job)
-      await handleMicrosoftServiceAccountChanges()
+      await Promise.allSettled([
+        handleGoogleServiceAccountChanges(boss, job),
+        handleMicrosoftServiceAccountChanges(),
+      ])
server/server.ts (1)

1559-1562: Queue init at startup: ensure idempotency and single owner

Confirm initQueue() won’t be executed by multiple processes (e.g., multiple Bun instances) leading to duplicate schedules or boss.start() conflicts. If you horizontally scale, consider leader election or a configuration gate.

If you run multiple server instances, ensure only one runs initQueue(). Otherwise, move queue to a dedicated worker process.

server/api/admin.ts (2)

112-117: Remove unused import.

The todo import from node:test on line 112 appears to be unused and should be removed.

-import { todo } from "node:test"
-import { ClientSecretCredential } from "@azure/identity"
-import { Client as GraphClient } from "@microsoft/microsoft-graph-client"
-import type { AuthenticationProvider } from "@microsoft/microsoft-graph-client"
-import { handleMicrosoftServiceAccountIngestion } from "@/integrations/microsoft"
-import { CustomServiceAuthProvider } from "@/integrations/microsoft/utils"
+import { ClientSecretCredential } from "@azure/identity"
+import { Client as GraphClient } from "@microsoft/microsoft-graph-client"
+import type { AuthenticationProvider } from "@microsoft/microsoft-graph-client"
+import { handleMicrosoftServiceAccountIngestion } from "@/integrations/microsoft"
+import { CustomServiceAuthProvider } from "@/integrations/microsoft/utils"

442-534: Consider improving error handling consistency and adding input validation.

The function has good error handling overall, but could benefit from more specific error types and additional validation for the App type.

  1. Add validation to ensure the app type is supported for service account authentication:
  const app = Apps.MicrosoftDrive
+
+ // Validate that the app supports service account authentication
+ if (!IsMicrosoftApp(app)) {
+   throw new HTTPException(400, {
+     message: "Application type does not support service account authentication",
+   })
+ }
  1. Consider using a more specific error type for authentication failures:
  try {
    const authProvider = new CustomServiceAuthProvider(
      tenantId,
      clientId,
      clientSecret,
    )

    const accessToken = await authProvider.getAccessTokenWithExpiry()
+  } catch (authError) {
+    if (authError instanceof Error && authError.message.includes("authentication")) {
+      throw new HTTPException(401, {
+        message: "Failed to authenticate with Microsoft service account",
+      })
+    }
+    throw authError
+  }
+
+  try {
    const expiresAt = new Date(accessToken.expiresOnTimestamp)
server/integrations/microsoft/sync.ts (1)

83-88: Remove unused import.

The unknown import from zod appears to be unused.

-import { unknown } from "zod"
 import {
   discoverSharePointSites,
   discoverSiteDrives,
   processSiteDrives,
 } from "./sharepoint"
server/utils.ts (1)

13-13: Remove unused import.

The todo import from node:test appears to be unused.

-import { todo } from "node:test"
server/integrations/microsoft/index.ts (1)

851-852: Consider making site filtering configurable.

The SharePoint sites discovery doesn't appear to have any filtering, but the sync module has hardcoded site filters. Consider making this consistent and configurable.

Add an optional filter parameter or configuration:

-    let sites = await discoverSharePointSites(graphClient, email!)
+    // TODO: Add site filtering configuration if needed
+    let sites = await discoverSharePointSites(graphClient, email!)
+    
+    // Optional: Filter sites based on configuration
+    // const siteFilter = config.sharepoint?.allowedSites
+    // if (siteFilter && siteFilter.length > 0) {
+    //   sites = sites.filter(site => siteFilter.includes(site.name))
+    // }
server/integrations/microsoft/client.ts (2)

6-6: Remove unused import or wire it up.

CustomServiceAuthProvider is imported but not used.

Apply this diff:

-import { CustomServiceAuthProvider } from "./utils"

192-205: Prefer getStream() for file content endpoints.

For /content, getStream() avoids JSON assumptions and is clearer. Optional change.

Example (requires a stream-aware helper):

const stream = await graphClient.client.api(endpoint).getStream()
return await streamToBuffer(stream)
server/integrations/microsoft/utils.ts (7)

37-50: Use typed error consistent with Google integration for parity and handling.

Align with DeleteDocumentError pattern to preserve error semantics and observability.

Apply this diff (imports may be needed for the error type):

-    throw new Error(`Error deleting file: ${(err as Error).message}`)
+    // Consider reusing a typed error similar to Google integration
+    throw new Error(`Error deleting file: ${(err as Error).message}`)

If you have a DeleteDocumentError for Microsoft, emit that instead with integration: Apps.MicrosoftDrive.


162-167: Use existing retry helper for content fetches (rate-limit resilience).

Wrap this call with the shared retry logic, or reuse makeGraphApiCallWithHeaders from the client module.

Example change:

-    const contentResponse = await graphClient.client
-      .api(`/me/drive/items/${fileId}/content`)
-      .header("Accept", "text/plain")
-      .get()
+    // import { makeGraphApiCallWithHeaders } from "./client"
+    const contentResponse = await makeGraphApiCallWithHeaders(
+      graphClient,
+      `/me/drive/items/${fileId}/content`,
+      { Accept: "text/plain" },
+    )

249-256: Comment contradicts behavior for deletions.

Comment says “Skip…deleted”, but code returns true. Clarify intent.

Apply this diff:

-  // Skip if it's a deleted item and we don't have it in our system
-  if (change.deleted) {
-    return true // We need to process deletions to remove from Vespa
-  }
+  // Process deletions to remove docs from Vespa
+  if (change.deleted) return true

366-371: PDF pipeline: avoid non-null assertions, fill permissions/owner, and use Logger.

  • Don’t assert driveId!; it’s optional.
  • Populate permissions via getFilePermissions.
  • Prefer Logger.error over console.error.
  • Guard date parsing.

Apply this diff:

-      const pdfBuffer = await downloadFileFromGraph(
-        graphClient,
-        file.id!,
-        file.parentReference?.driveId!,
-      )
+      const pdfBuffer = await downloadFileFromGraph(
+        graphClient,
+        file.id!,
+        file.parentReference?.driveId,
+      )
@@
-      await fs.promises.writeFile(pdfPath, new Uint8Array(pdfBuffer))
+      await fs.promises.writeFile(pdfPath, new Uint8Array(pdfBuffer))
@@
-      await deleteDocument(pdfPath)
+      await deleteDocument(pdfPath)
@@
-      results.push({
+      const permissions = await getFilePermissions(
+        graphClient,
+        file.id!,
+        file.parentReference?.driveId,
+      )
+      const createdAt = file.createdDateTime
+        ? new Date(file.createdDateTime).getTime()
+        : Date.now()
+      const updatedAt = file.lastModifiedDateTime
+        ? new Date(file.lastModifiedDateTime).getTime()
+        : createdAt
+
+      results.push({
         title: file.name!,
         url: file.webUrl ?? "",
         app: Apps.MicrosoftDrive,
         docId: file.id!,
         parentId: file.parentReference?.id ?? null,
-        owner: "", // Extract from file.createdBy if available
+        owner: file.createdBy?.user?.displayName ?? "",
         photoLink: "",
         ownerEmail: userEmail,
         entity: DriveEntity.PDF,
         chunks: chunks.map((c: any) => c.chunk),
-        permissions: [], // Process file.permissions if available
+        permissions,
         mimeType: file.file?.mimeType ?? "",
         metadata: JSON.stringify({
           parentPath: file.parentReference?.path,
           size: file.size,
         }),
-        createdAt: new Date(file.createdDateTime!).getTime(),
-        updatedAt: new Date(file.lastModifiedDateTime!).getTime(),
+        createdAt,
+        updatedAt,
       })

And replace console.error with:

-      console.error(`Error processing PDF ${file.name}:`, error)
+      Logger.error(error, `Error processing PDF ${file.name}: ${error}`)

Also applies to: 376-393, 395-415


435-444: DOCX pipeline: same fixes as PDFs (permissions/owner/dates and Logger).

Apply this diff:

-      const docxBuffer = await downloadFileFromGraph(
-        graphClient,
-        file.id!,
-        file.parentReference?.driveId!,
-      )
+      const docxBuffer = await downloadFileFromGraph(
+        graphClient,
+        file.id!,
+        file.parentReference?.driveId,
+      )
@@
-      results.push({
+      const permissions = await getFilePermissions(
+        graphClient,
+        file.id!,
+        file.parentReference?.driveId,
+      )
+      const createdAt = file.createdDateTime
+        ? new Date(file.createdDateTime).getTime()
+        : Date.now()
+      const updatedAt = file.lastModifiedDateTime
+        ? new Date(file.lastModifiedDateTime).getTime()
+        : createdAt
+
+      results.push({
         title: file.name!,
         url: file.webUrl ?? "",
         app: Apps.MicrosoftDrive,
         docId: file.id!,
         parentId: file.parentReference?.id ?? null,
-        owner: "",
+        owner: file.createdBy?.user?.displayName ?? "",
         photoLink: "",
         ownerEmail: userEmail,
         entity: DriveEntity.Docs, // Reuse Google's entity types
         chunks: extractedContent.text_chunks || [], // Use text_chunks property
-        permissions: [],
+        permissions,
         mimeType: file.file?.mimeType ?? "",
         metadata: JSON.stringify({
           parentPath: file.parentReference?.path,
           size: file.size,
           images: extractedContent.image_chunks?.length || 0, // Use image_chunks property
         }),
-        createdAt: new Date(file.createdDateTime!).getTime(),
-        updatedAt: new Date(file.lastModifiedDateTime!).getTime(),
+        createdAt,
+        updatedAt,
       })

And:

-      console.error(`Error processing Word document ${file.name}:`, error)
+      Logger.error(error, `Error processing Word document ${file.name}: ${error}`)

Also applies to: 445-466


476-493: Excel pipeline: keep logging consistent and guard dates.

If you keep this path commented-out elsewhere, these changes are still safe.

Apply this diff:

-      const workbook = await graphClient
+      const workbook = await graphClient
         .api(`${base}/${file.id}/workbook/worksheets`)
         .get()
@@
-          console.error(
-            `Error processing worksheet ${worksheet.name}:`,
-            worksheetError,
-          )
+          Logger.error(
+            worksheetError as Error,
+            `Error processing worksheet ${worksheet.name}: ${worksheetError}`,
+          )
           continue
@@
-      results.push({
+      const createdAt = file.createdDateTime
+        ? new Date(file.createdDateTime).getTime()
+        : Date.now()
+      const updatedAt = file.lastModifiedDateTime
+        ? new Date(file.lastModifiedDateTime).getTime()
+        : createdAt
+
+      results.push({
         title: file.name!,
         url: file.webUrl ?? "",
         app: Apps.MicrosoftDrive,
         docId: file.id!,
         parentId: file.parentReference?.id ?? null,
         owner: "",
         photoLink: "",
         ownerEmail: userEmail,
         entity: DriveEntity.Sheets,
         chunks,
         permissions: [],
         mimeType: file.file?.mimeType ?? "",
         metadata: JSON.stringify({
           parentPath: file.parentReference?.path,
           size: file.size,
           worksheetCount: workbook.value?.length || 0,
         }),
-        createdAt: new Date(file.createdDateTime!).getTime(),
-        updatedAt: new Date(file.lastModifiedDateTime!).getTime(),
+        createdAt,
+        updatedAt,
       })

And:

-      console.error(`Error processing Excel file ${file.name}:`, error)
+      Logger.error(error as Error, `Error processing Excel file ${file.name}: ${error}`)

Also applies to: 496-527, 529-551


559-601: processFileContent: Excel path is disabled.

If intentional, add a brief comment/TODO to track enabling later.

Apply this diff:

       case "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet":
-        // const excelResults = await processMicrosoftExcel(
+        // TODO: Enable Excel extraction when finalized
+        // const excelResults = await processMicrosoftExcel(
         //   graphClient.client,
         //   [file],
         //   userEmail,
         // )
         // return excelResults[0]?.chunks || []
         return []
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 987adbd and bdf4230.

⛔ Files ignored due to path filters (3)
  • bun.lock is excluded by !**/*.lock
  • frontend/bun.lockb is excluded by !**/bun.lockb
  • server/bun.lockb is excluded by !**/bun.lockb
📒 Files selected for processing (16)
  • frontend/.env.default (0 hunks)
  • frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (4 hunks)
  • package.json (1 hunks)
  • server/api/admin.ts (4 hunks)
  • server/db/connector.ts (5 hunks)
  • server/integrations/microsoft/client.ts (8 hunks)
  • server/integrations/microsoft/config.ts (1 hunks)
  • server/integrations/microsoft/index.ts (3 hunks)
  • server/integrations/microsoft/sharepoint/index.ts (1 hunks)
  • server/integrations/microsoft/sync.ts (11 hunks)
  • server/integrations/microsoft/utils.ts (3 hunks)
  • server/package.json (2 hunks)
  • server/queue/index.ts (2 hunks)
  • server/server.ts (6 hunks)
  • server/types.ts (5 hunks)
  • server/utils.ts (4 hunks)
💤 Files with no reviewable changes (1)
  • frontend/.env.default
🧰 Additional context used
🧠 Learnings (3)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

  • server/integrations/microsoft/utils.ts
  • server/integrations/microsoft/index.ts
  • server/queue/index.ts
  • server/integrations/microsoft/sync.ts
📚 Learning: 2025-09-12T13:28:43.782Z
Learnt from: devesh-juspay
PR: xynehq/xyne#821
File: server/server.ts:771-837
Timestamp: 2025-09-12T13:28:43.782Z
Learning: All workflow API endpoints should be protected with AuthMiddleware to prevent unauthorized access to template creation, execution, tool management, and other sensitive workflow operations.

Applied to files:

  • server/server.ts
📚 Learning: 2025-06-10T05:40:04.427Z
Learnt from: naSim087
PR: xynehq/xyne#525
File: frontend/src/routes/_authenticated/admin/integrations/slack.tsx:141-148
Timestamp: 2025-06-10T05:40:04.427Z
Learning: In frontend/src/routes/_authenticated/admin/integrations/slack.tsx, the ConnectAction enum and related connectAction state (lines 141-148, 469-471) are intentionally kept for future development, even though they appear unused in the current implementation.

Applied to files:

  • frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx
🧬 Code graph analysis (11)
server/integrations/microsoft/sharepoint/index.ts (6)
server/integrations/microsoft/client.ts (2)
  • MicrosoftGraphClient (23-37)
  • makeGraphApiCall (98-112)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/integrations/tracker.ts (1)
  • Tracker (89-325)
server/integrations/microsoft/utils.ts (2)
  • getFilePermissions (278-353)
  • processFileContent (559-601)
server/shared/types.ts (2)
  • Apps (38-38)
  • DriveEntity (33-33)
server/search/vespa.ts (1)
  • insertWithRetry (171-171)
server/types.ts (1)
server/db/oauthProvider.ts (1)
  • createOAuthProvider (14-34)
server/integrations/microsoft/utils.ts (4)
server/integrations/google/index.ts (3)
  • downloadDir (2431-2431)
  • deleteDocument (1467-1484)
  • safeLoadPDF (2479-2495)
server/integrations/microsoft/client.ts (4)
  • MicrosoftGraphClient (23-37)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • Client (249-249)
server/utils.ts (1)
  • hashPdfFilename (325-332)
server/docxChunks.ts (1)
  • extractTextAndImagesWithChunksFromDocx (2412-2765)
server/server.ts (6)
server/api/search.ts (1)
  • chatTitleSchema (132-135)
server/api/chat/chat.ts (1)
  • GenerateChatTitleApi (7168-7215)
server/types.ts (2)
  • createOAuthProvider (187-225)
  • microsoftServiceSchema (227-233)
server/db/oauthProvider.ts (1)
  • createOAuthProvider (14-34)
server/api/admin.ts (2)
  • CreateOAuthProvider (356-440)
  • AddServiceConnectionMicrosoft (442-534)
server/errors/index.ts (1)
  • InitialisationError (99-108)
server/integrations/microsoft/index.ts (7)
server/db/schema/connectors.ts (2)
  • SelectConnector (146-146)
  • connectors (60-109)
server/integrations/tracker.ts (1)
  • Tracker (89-325)
server/types.ts (1)
  • MicrosoftServiceCredentials (461-468)
server/integrations/microsoft/client.ts (1)
  • createMicrosoftGraphClient (60-95)
server/integrations/metricStream.ts (2)
  • sendWebsocketMessage (11-16)
  • closeWs (5-7)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (12-47)
  • discoverSiteDrives (50-110)
  • processSiteDrives (113-247)
server/errors/index.ts (1)
  • CouldNotFinishJobSuccessfully (194-203)
server/db/connector.ts (5)
server/utils.ts (2)
  • IsGoogleApp (300-307)
  • IsMicrosoftApp (309-316)
server/types.ts (3)
  • OAuthCredentials (312-318)
  • TxnOrClient (310-310)
  • MicrosoftServiceCredentials (461-468)
server/db/schema/connectors.ts (2)
  • connectors (60-109)
  • selectConnectorSchema (135-139)
server/errors/index.ts (2)
  • NoOauthConnectorFound (290-295)
  • MissingOauthConnectorCredentialsError (262-271)
server/integrations/microsoft/utils.ts (1)
  • CustomServiceAuthProvider (99-129)
server/utils.ts (3)
server/integrations/microsoft/client.ts (2)
  • MicrosoftGraphClient (23-37)
  • updateMicrosoftGraphClient (39-57)
server/integrations/microsoft/config.ts (1)
  • scopes (2-5)
server/integrations/microsoft/utils.ts (1)
  • CustomServiceAuthProvider (99-129)
server/queue/index.ts (1)
server/integrations/microsoft/sync.ts (1)
  • handleMicrosoftServiceAccountChanges (1825-1925)
server/api/admin.ts (8)
server/integrations/microsoft/index.ts (2)
  • loggerWithChild (82-84)
  • handleMicrosoftServiceAccountIngestion (817-923)
server/db/user.ts (1)
  • getUserByEmail (98-107)
server/db/connector.ts (2)
  • db (3-3)
  • insertConnector (43-92)
server/errors/index.ts (3)
  • NoUserFound (322-327)
  • ConnectorNotCreated (338-343)
  • AddServiceConnectionError (330-335)
server/types.ts (2)
  • microsoftService (295-295)
  • MicrosoftServiceCredentials (461-468)
server/integrations/microsoft/utils.ts (1)
  • CustomServiceAuthProvider (99-129)
server/db/schema/connectors.ts (1)
  • selectConnectorSchema (135-139)
server/utils.ts (1)
  • getErrorMessage (106-109)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • MicrosoftGraphClient (23-37)
  • createMicrosoftGraphClient (60-95)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (12-47)
  • discoverSiteDrives (50-110)
  • processSiteDrives (113-247)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/utils.ts (1)
  • getErrorMessage (106-109)
server/search/vespa.ts (4)
  • getDocumentOrNull (63-63)
  • DeleteDocument (65-65)
  • UpdateDocumentPermissions (172-173)
  • insertWithRetry (171-171)
server/integrations/microsoft/utils.ts (2)
  • getFilePermissions (278-353)
  • processFileContent (559-601)
server/db/syncJob.ts (1)
  • getAppSyncJobs (35-45)
server/db/connector.ts (2)
  • db (3-3)
  • getMicrosoftAuthConnectorWithCredentials (307-368)
server/types.ts (1)
  • MicrosoftServiceCredentials (461-468)
server/db/syncHistory.ts (1)
  • insertSyncHistory (10-30)
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (2)
frontend/src/api.ts (1)
  • api (4-4)
server/shared/types.ts (1)
  • Apps (38-38)
🔇 Additional comments (18)
server/package.json (1)

55-55: OK to bump @xyne/vespa-ts

No issue spotted with ^1.0.6 bump.

server/db/connector.ts (2)

167-179: Token expiry helper refactor looks good

Switch to IsExpired(app, expiresAt, buffer) is reasonable and used consistently for Google/Microsoft.


215-219: Good: expiry check now uses expiresAt from credentials

This aligns with the new IsExpired signature.

frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (2)

213-298: Admin-only auth-type UI: looks good

Conditional radio group for delegated vs app-only and tenantId-only when app-only is selected is sound. Disabled scopes input matches server default-scope behavior.

Confirm that non-admin users never see app-only and cannot post to the admin route from the client.


112-128: Status transitions per auth type are sensible

  • App-only: set OAuthConnecting immediately after backend confirms creation.
  • Delegated: set OAuth to proceed with modal OAuth.

LGTM.

server/queue/index.ts (1)

27-31: Microsoft service-account sync wiring added

Importing handleMicrosoftServiceAccountChanges is correct and consistent with sync worker usage.

server/types.ts (3)

295-295: Export type alias: OK

microsoftService export matches the new schema exposure.


384-389: SharePoint delta token type added

Schema looks appropriate; union update below includes it.


461-468: MicrosoftServiceCredentials type: OK

Matches usage in DB credentials and sync code.

server/server.ts (1)

787-791: Chat title endpoint addition: OK

Schema validation and handler wiring look correct.

server/integrations/microsoft/config.ts (1)

3-3: Switch to .default scope: LGTM

  • Works for app-only flows (application permissions).
  • Retaining offline_access is appropriate for delegated OAuth to obtain refresh tokens.
server/integrations/microsoft/sharepoint/index.ts (1)

131-131: Duplicate drive name in log message.

The log message uses siteDrive.name for both the drive name and site name, which appears to be incorrect.

[raise_minor_issue]

  loggerWithChild({ email: userEmail }).info(
-   `Processing drive: ${siteDrive.name} from site: ${siteDrive.name}`,
+   `Processing drive: ${siteDrive.name}`,
  )
server/integrations/microsoft/sync.ts (1)

1825-1925: LGTM! Well-structured service account sync implementation.

The implementation follows the established pattern from OAuth sync, properly handles errors, updates sync job status, and creates sync history records appropriately.

server/utils.ts (1)

216-251: Well-structured multi-provider authentication handling.

The implementation properly distinguishes between OAuth/delegated and service/app-only authentication for Microsoft, with appropriate error handling for missing credentials.

server/integrations/microsoft/index.ts (1)

817-923: LGTM! Well-structured service account ingestion implementation.

The implementation includes proper error handling, progress tracking via websockets, transaction management, and comprehensive logging with job IDs for traceability.

server/integrations/microsoft/utils.ts (2)

52-69: LGTM: robust PDF loader with password handling.


98-129: LGTM: service auth provider using ClientSecretCredential.

Good separation for app-only flows; Azure Identity caches tokens.

server/integrations/microsoft/client.ts (1)

97-112: Incorrect — retryWithBackoff does not log/serialize the authClient; passing graphClient is intended.

retryWithBackoff only logs the caught error and the context string (Logger.warn(error); Logger.info/error with context) and uses the optional authClient to refresh tokens on 401. It does not serialize or log the authClient parameter, so the specific clientSecret-leak concern is unfounded. For extra hardening, avoid including secrets in error objects or pass a redacted object (e.g., { tenantId, clientId }) / ensure logger redaction.

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
server/db/connector.ts (1)

167-179: Defensively handle invalid/missing expiresAt; treat as expired.

With an invalid Date, getTime() is NaN and the function returns false, skipping refresh. Force-refresh on invalid expiration.

-const IsExpired = (
-  app: Apps,
-  expiresAt: Date,
-  bufferInSeconds: number,
-): boolean => {
-  if (IsGoogleApp(app) || IsMicrosoftApp(app)) {
-    const now: Date = new Date()
-    const currentTime = now.getTime()
-    const expirationTime = new Date(expiresAt).getTime()
-    return currentTime + bufferInSeconds * 1000 > expirationTime
-  }
-  return false
-}
+const IsExpired = (
+  app: Apps,
+  expiresAt: Date | string | number | null | undefined,
+  bufferInSeconds: number,
+): boolean => {
+  if (!(IsGoogleApp(app) || IsMicrosoftApp(app))) return false
+  const expMs =
+    expiresAt instanceof Date
+      ? expiresAt.getTime()
+      : new Date(expiresAt as any).getTime()
+  if (!Number.isFinite(expMs)) {
+    // Invalid/missing expiry => force refresh
+    return true
+  }
+  return Date.now() + bufferInSeconds * 1000 >= expMs
+}
server/integrations/microsoft/sync.ts (1)

1624-1629: Fix mis-ordered createMicrosoftGraphClient arguments (critical)

createMicrosoftGraphClient(accessToken, clientId, clientSecret, refreshToken?, tenantId?, tokenExpiresAt?). Several call sites pass refreshToken as the 2nd argument (or tenantId as the 4th), which places values into clientId/clientSecret slots and breaks authentication. Fix calls to pass clientId/clientSecret as args 2/3, refreshToken as arg 4, tenantId as arg 5.

  • server/integrations/microsoft/sync.ts (lines ~1624–1627) — replace
    const graphClient = createMicrosoftGraphClient(
    oauthTokens.access_token,
    oauthTokens.refresh_token,
    MICROSOFT_CLIENT_ID,
    MICROSOFT_CLIENT_SECRET,
    )
    with
    const graphClient = createMicrosoftGraphClient(
    oauthTokens.access_token,
    MICROSOFT_CLIENT_ID,
    MICROSOFT_CLIENT_SECRET,
    oauthTokens.refresh_token,
    )

  • server/integrations/microsoft/sync.ts (lines ~1730–1733) — same replacement as above.

  • server/integrations/microsoft/index.ts (lines ~956–959) — replace
    const graphClient = createMicrosoftGraphClient(
    oauthTokens.access_token,
    oauthTokens.refresh_token,
    microsoftProvider.clientId!,
    microsoftProvider.clientSecret! as string,
    )
    with
    const graphClient = createMicrosoftGraphClient(
    oauthTokens.access_token,
    microsoftProvider.clientId!,
    microsoftProvider.clientSecret! as string,
    oauthTokens.refresh_token,
    )

  • server/integrations/microsoft/index.ts (lines ~832–835) — service account call currently passes tenantId as the 4th arg; replace
    const graphClient = createMicrosoftGraphClient(
    credentials.access_token,
    credentials.clientId,
    credentials.clientSecret,
    credentials.tenantId,
    )
    with
    const graphClient = createMicrosoftGraphClient(
    credentials.access_token,
    credentials.clientId,
    credentials.clientSecret,
    undefined,
    credentials.tenantId,
    )

🧹 Nitpick comments (7)
server/db/connector.ts (4)

38-40: Remove stray/unused imports (zod date, node:test todo).

These appear accidental in a DB module.

 import { CustomServiceAuthProvider } from "@/integrations/microsoft/utils"
-import { date } from "zod"
-import { todo } from "node:test"

315-319: Constrain to Microsoft apps in the query.

Function name suggests Microsoft-only; add an app filter to avoid matching other service accounts.

       and(
         eq(connectors.id, connectorId),
         eq(connectors.authType, AuthType.ServiceAccount),
+        inArray(connectors.app, [
+          Apps.MicrosoftDrive,
+          Apps.MicrosoftOutlook,
+          Apps.MicrosoftCalendar,
+          Apps.MicrosoftSharepoint,
+        ]),
       ),

Add the import (outside this hunk):

-import { and, eq } from "drizzle-orm"
+import { and, eq, inArray } from "drizzle-orm"

322-326: Use a more accurate error for service-account connectors.

These aren’t OAuth connectors; prefer NoConnectorsFound with a precise message.

-  if (!res.length) {
-    throw new NoOauthConnectorFound({
-      message: `Could not get the oauth connector with id:  ${connectorId}`,
-    })
-  }
+  if (!res.length) {
+    throw new NoConnectorsFound({
+      message: `Could not get the service-account connector with id: ${connectorId}`,
+    })
+  }

338-356: Avoid refresh stampede under concurrency.

Multiple workers can detect expiry and refresh simultaneously. Consider row-level locking (SELECT ... FOR UPDATE) or an atomic compare-and-swap on expires_at in the UPDATE to serialize refresh.

server/integrations/microsoft/sync.ts (3)

1302-1305: Avoid logging raw deltaLink URLs/tokens.

Delta URLs can embed opaque tokens. Don’t log them.

-    loggerWithChild({ email }).info(
-      `Fetching delta changes from: ${nextLink.substring(0, 100)}...`,
-    )
+    loggerWithChild({ email }).info(
+      `Fetching SharePoint drive delta changes (URL redacted)`,
+    )

1188-1190: Implement cleanup for deleted drives.

Currently a TODO. Without cleanup, Vespa retains orphaned docs when drives are removed.

High-level approach:

  • Track docs by siteId::driveId composite key in metadata.
  • On deletion, bulk remove matching docs (or queue a background cleanup job).

I can draft a deleteDriveDocuments(siteAndDriveKeys: string[]) helper using Vespa queries if you want.


94-116: Defer env var validation to runtime call sites.

Top‑level throws on missing MICROSOFT_CLIENT_ID/SECRET can crash unrelated workers/tests that import this module but don’t use Microsoft integrations.

  • Move validateMicrosoftCredentials() into the functions that actually need it (e.g., inside OAuth handlers) or guard it behind a feature flag check.
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between bdf4230 and 4ca07da.

📒 Files selected for processing (2)
  • server/db/connector.ts (5 hunks)
  • server/integrations/microsoft/sync.ts (11 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

  • server/integrations/microsoft/sync.ts
🧬 Code graph analysis (2)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • MicrosoftGraphClient (23-37)
  • createMicrosoftGraphClient (60-95)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (12-47)
  • discoverSiteDrives (50-110)
  • processSiteDrives (113-247)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/utils.ts (1)
  • getErrorMessage (106-109)
server/search/vespa.ts (4)
  • getDocumentOrNull (63-63)
  • DeleteDocument (65-65)
  • UpdateDocumentPermissions (172-173)
  • insertWithRetry (171-171)
server/integrations/microsoft/utils.ts (2)
  • getFilePermissions (278-353)
  • processFileContent (559-601)
server/db/syncJob.ts (2)
  • getAppSyncJobs (35-45)
  • updateSyncJob (65-87)
server/db/connector.ts (2)
  • db (3-3)
  • getMicrosoftAuthConnectorWithCredentials (307-368)
server/types.ts (1)
  • MicrosoftServiceCredentials (461-468)
server/db/syncHistory.ts (1)
  • insertSyncHistory (10-30)
server/db/connector.ts (5)
server/utils.ts (2)
  • IsGoogleApp (300-307)
  • IsMicrosoftApp (309-316)
server/types.ts (3)
  • OAuthCredentials (312-318)
  • TxnOrClient (310-310)
  • MicrosoftServiceCredentials (461-468)
server/db/schema/connectors.ts (2)
  • connectors (60-109)
  • selectConnectorSchema (135-139)
server/errors/index.ts (2)
  • NoOauthConnectorFound (290-295)
  • MissingOauthConnectorCredentialsError (262-271)
server/integrations/microsoft/utils.ts (1)
  • CustomServiceAuthProvider (99-129)
🔇 Additional comments (6)
server/db/connector.ts (4)

14-18: Import realignment looks good.

Type-only imports from "@/types" are correct and moving ConnectorType out of here avoids duplication.


21-21: Using shared ConnectorType enum is correct.

Good move to source ConnectorType from "@/shared/types".


215-219: Callsite updated to pass Date-based expiry — OK.


354-356: Refreshed token is now persisted correctly.

You serialize the updated credentials object, addressing the earlier bug.

server/integrations/microsoft/sync.ts (2)

1164-1166: Good: no hardcoded site-name filters.

Site discovery no longer hardcodes allowlists like "VISA_SIN" etc. This addresses prior review feedback.


1412-1414: Incorrect — driveId is already handled via file.parentReference (no change required)

processFileContent delegates to processMicrosoftPDFs/processMicrosoftWord which call downloadFileFromGraph(..., file.parentReference?.driveId), and client.ts switches to the drives/{driveId}/items/{fileId}/content endpoint when a driveId is provided. See server/integrations/microsoft/utils.ts (processMicrosoftPDFs/processMicrosoftWord) and server/integrations/microsoft/client.ts (downloadUrl/driveId handling).

Likely an incorrect or invalid review comment.

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
server/integrations/microsoft/sync.ts (2)

1729-1734: createMicrosoftGraphClient args out of order (Calendar)

Same ordering bug here.

-      const graphClient = createMicrosoftGraphClient(
-        oauthTokens.access_token,
-        oauthTokens.refresh_token,
-        MICROSOFT_CLIENT_ID,
-        MICROSOFT_CLIENT_SECRET,
-      )
+      const graphClient = createMicrosoftGraphClient(
+        oauthTokens.access_token,
+        MICROSOFT_CLIENT_ID,
+        MICROSOFT_CLIENT_SECRET,
+        oauthTokens.refresh_token,
+      )

731-744: DeletedItems delta endpoint defaults to empty string

When there’s no prior token, you call makeGraphApiCall with "", which will fail. Provide a proper initial delta endpoint.

-      let deletedItemsEndpoint: string = ""
-
-      if (deletedItemsDeltaToken && deletedItemsDeltaToken.startsWith("http")) {
-        // Use existing delta token URL
-        const url = new URL(deletedItemsDeltaToken)
-        deletedItemsEndpoint = url.pathname + url.search
-      }
+      let deletedItemsEndpoint: string
+      if (deletedItemsDeltaToken && deletedItemsDeltaToken.startsWith("http")) {
+        const url = new URL(deletedItemsDeltaToken)
+        deletedItemsEndpoint = url.pathname + url.search
+      } else {
+        deletedItemsEndpoint =
+          "/me/mailFolders/deleteditems/messages/delta" +
+          "?$top=100&$select=id,internetMessageId"
+      }
🧹 Nitpick comments (2)
server/api/admin.ts (1)

127-133: Remove unused test/deps imports

These imports are unused here and can confuse bundlers/tree-shaking.

-import { todo } from "node:test"
-import { ClientSecretCredential } from "@azure/identity"
-import { Client as GraphClient } from "@microsoft/microsoft-graph-client"
-import type { AuthenticationProvider } from "@microsoft/microsoft-graph-client"
server/integrations/microsoft/sync.ts (1)

346-350: Use makeGraphApiCall for fetch/retry consistency

Direct graphClient.client.api().get() bypasses your retry/backoff and token-refresh path.

-        const contentResponse = await graphClient.client
-          .api(`/me/drive/items/${file.id}/content`)
-          .header("Accept", "text/plain")
-          .get()
+        const contentResponse = await makeGraphApiCall(
+          graphClient,
+          `/me/drive/items/${file.id}/content`,
+        )
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4ca07da and 4fb9bf0.

📒 Files selected for processing (5)
  • server/api/admin.ts (4 hunks)
  • server/integrations/microsoft/sync.ts (11 hunks)
  • server/package.json (1 hunks)
  • server/server.ts (6 hunks)
  • server/types.ts (5 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/types.ts
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-09-12T13:28:43.782Z
Learnt from: devesh-juspay
PR: xynehq/xyne#821
File: server/server.ts:771-837
Timestamp: 2025-09-12T13:28:43.782Z
Learning: All workflow API endpoints should be protected with AuthMiddleware to prevent unauthorized access to template creation, execution, tool management, and other sensitive workflow operations.

Applied to files:

  • server/server.ts
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

  • server/integrations/microsoft/sync.ts
🧬 Code graph analysis (3)
server/server.ts (4)
server/types.ts (2)
  • createOAuthProvider (188-226)
  • microsoftServiceSchema (228-234)
server/db/oauthProvider.ts (1)
  • createOAuthProvider (14-34)
server/api/admin.ts (2)
  • CreateOAuthProvider (371-455)
  • AddServiceConnectionMicrosoft (457-549)
server/errors/index.ts (1)
  • InitialisationError (99-108)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • MicrosoftGraphClient (23-37)
  • createMicrosoftGraphClient (60-95)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (12-47)
  • discoverSiteDrives (50-110)
  • processSiteDrives (113-247)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/utils.ts (1)
  • getErrorMessage (106-109)
server/search/vespa.ts (4)
  • getDocumentOrNull (65-65)
  • DeleteDocument (67-67)
  • UpdateDocumentPermissions (213-214)
  • insertWithRetry (212-212)
server/integrations/microsoft/utils.ts (2)
  • getFilePermissions (278-353)
  • processFileContent (559-601)
server/db/syncJob.ts (2)
  • getAppSyncJobs (35-45)
  • updateSyncJob (65-87)
server/db/connector.ts (2)
  • db (3-3)
  • getMicrosoftAuthConnectorWithCredentials (307-368)
server/types.ts (1)
  • MicrosoftServiceCredentials (462-469)
server/db/syncHistory.ts (1)
  • insertSyncHistory (10-30)
server/api/admin.ts (8)
server/integrations/microsoft/index.ts (2)
  • loggerWithChild (82-84)
  • handleMicrosoftServiceAccountIngestion (817-923)
server/db/user.ts (1)
  • getUserByEmail (145-154)
server/db/connector.ts (2)
  • db (3-3)
  • insertConnector (43-92)
server/errors/index.ts (3)
  • NoUserFound (322-327)
  • ConnectorNotCreated (338-343)
  • AddServiceConnectionError (330-335)
server/types.ts (2)
  • microsoftService (296-296)
  • MicrosoftServiceCredentials (462-469)
server/integrations/microsoft/utils.ts (1)
  • CustomServiceAuthProvider (99-129)
server/db/schema/connectors.ts (1)
  • selectConnectorSchema (135-139)
server/utils.ts (1)
  • getErrorMessage (106-109)
🔇 Additional comments (9)
server/package.json (1)

49-49: Bun compatibility risk: @azure/identity v4 not officially supported

This dependency is known to target Node LTS and browsers; Bun support is unofficial and has reported incompatibilities. If this service runs under Bun, expect runtime issues unless you add shims/polyfills or switch to Node for the server process.

Would you like me to add runtime checks/docs to enforce Node LTS for the server and keep Bun for tooling only?

server/api/admin.ts (2)

466-468: Replace ts-ignore with a type assertion

Don’t suppress types; assert the validated shape.

-  // @ts-ignore
-  const form: microsoftService = c.req.valid("form")
+  const form = c.req.valid("form") as microsoftService

520-520: Background ingestion must handle rejections

Fire-and-forget without .catch risks unhandled rejections and silent failures. Mirror the Slack/Google pattern.

Apply:

-    handleMicrosoftServiceAccountIngestion(email, connector)
+    handleMicrosoftServiceAccountIngestion(email, connector).catch((error) => {
+      loggerWithChild({ email }).error(
+        error,
+        `Background Microsoft service account ingestion failed for connector ${connector.id}: ${getErrorMessage(error)}`,
+      )
+    })
server/server.ts (2)

1507-1510: Queue init error handling is fine

initQueue() is awaited via .catch and will surface as InitialisationError early. LGTM.


1057-1061: microsoftServiceSchema likely rejects valid forms (scopes superRefine bleed-through)

microsoftServiceSchema derives from createOAuthProvider, whose superRefine enforces scopes when isUsingGlobalCred is false. Your admin POST /microsoft/service_account submits only clientId, clientSecret, tenantId, app and will be rejected.

Fix in server/types.ts by defining an independent schema without the inherited superRefine:

--- a/server/types.ts
+++ b/server/types.ts
@@
 export const createOAuthProvider = z
   .object({
@@
   })
   .superRefine((data, ctx) => {
@@
   })

-// Service-account schema derived from createOAuthProvider (inherits superRefine)
-export const microsoftServiceSchema = createOAuthProvider
-  .omit({
-    scopes: true,
-  })
-  .extend({
-    tenantId: z.string(),
-  })
+// Independent service-account schema for Microsoft (no scopes required)
+export const microsoftServiceSchema = z.strictObject({
+  clientId: z.string().min(1),
+  clientSecret: z.string().min(1),
+  tenantId: z.string().min(1),
+  app: z.nativeEnum(Apps).default(Apps.MicrosoftSharepoint),
+})
server/integrations/microsoft/sync.ts (4)

1480-1485: createMicrosoftGraphClient args out of order (OAuth)

Signature is (accessToken, clientId, clientSecret, refreshToken?, tenantId?, tokenExpiresAt?). You’re passing refresh_token as clientId.

Apply:

-      const graphClient = createMicrosoftGraphClient(
-        oauthTokens.access_token,
-        oauthTokens.refresh_token,
-        MICROSOFT_CLIENT_ID,
-        MICROSOFT_CLIENT_SECRET,
-      )
+      const graphClient = createMicrosoftGraphClient(
+        oauthTokens.access_token,
+        MICROSOFT_CLIENT_ID,
+        MICROSOFT_CLIENT_SECRET,
+        oauthTokens.refresh_token,
+      )

Likely an incorrect or invalid review comment.


1421-1431: Do not persist @microsoft.graph.downloadUrl

This is a short-lived pre-authorized URL; storing it increases leakage risk.

       metadata: JSON.stringify({
         size: item.size,
-        downloadUrl: item["@microsoft.graph.downloadUrl"],
         siteId: siteId,
         driveId: driveId,
         parentId: item.parentReference?.id ?? "",
         parentPath: item.parentReference?.path ?? "/",
         eTag: item.eTag ?? "",
       }),

83-83: Invalid import: zod has no named export 'unknown'

This breaks build. Remove it (and use z.unknown() if needed).

-import { unknown } from "zod"

1241-1250: Preserve deltaLink when Graph returns none

Overwriting with empty string loses a valid token.

-      //update with new Link
-      deltaLinks[driveKey] = newDeltaLink
+      // Update only if a new deltaLink was returned
+      if (newDeltaLink) {
+        deltaLinks[driveKey] = newDeltaLink
+      } else {
+        loggerWithChild({ email }).warn(
+          `No new deltaLink for drive ${driveKey}; preserving existing token`,
+        )
+      }

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 5

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (2)
server/integrations/microsoft/sync.ts (2)

1612-1617: Fix createMicrosoftGraphClient argument order (Outlook).

Parameters are misordered; clientId and clientSecret must be 2nd and 3rd, refreshToken 4th.

Apply this diff:

-      const graphClient = createMicrosoftGraphClient(
-        oauthTokens.access_token,
-        oauthTokens.refresh_token,
-        MICROSOFT_CLIENT_ID,
-        MICROSOFT_CLIENT_SECRET,
-      )
+      const graphClient = createMicrosoftGraphClient(
+        oauthTokens.access_token,
+        MICROSOFT_CLIENT_ID,
+        MICROSOFT_CLIENT_SECRET,
+        oauthTokens.refresh_token,
+      )

1718-1723: Fix createMicrosoftGraphClient argument order (Calendar).

Same misordering here.

Apply this diff:

-      const graphClient = createMicrosoftGraphClient(
-        oauthTokens.access_token,
-        oauthTokens.refresh_token,
-        MICROSOFT_CLIENT_ID,
-        MICROSOFT_CLIENT_SECRET,
-      )
+      const graphClient = createMicrosoftGraphClient(
+        oauthTokens.access_token,
+        MICROSOFT_CLIENT_ID,
+        MICROSOFT_CLIENT_SECRET,
+        oauthTokens.refresh_token,
+      )
🧹 Nitpick comments (8)
server/package.json (1)

68-68: Type-only package can live in devDependencies

@types/pdf-parse supplies compile-time typings only; pulling it into dependencies bloats production installs with no runtime benefit. Let’s move it to devDependencies.

   "devDependencies": {
     "@biomejs/biome": "1.9.4",
@@
-    "p-queue": "^8.0.1",
+    "p-queue": "^8.0.1",
+    "@types/pdf-parse": "^1.1.5",
     "semantic-release": "^24.2.7",
@@
-    "@types/pdf-parse": "^1.1.5",
server/integrations/microsoft/utils.ts (4)

415-418: Use consistent logger instead of console.error.

Prefer Logger.error for structured logs.

Apply this diff:

-      console.error(`Error processing PDF ${file.name}:`, error)
+      Logger.error(error, `Error processing PDF ${file.name}: ${error}`)

521-525: Use consistent logger instead of console.error.

Apply this diff:

-          console.error(
-            `Error processing worksheet ${worksheet.name}:`,
-            worksheetError,
-          )
+          Logger.error(
+            worksheetError,
+            `Error processing worksheet ${worksheet.name}: ${worksheetError}`,
+          )

539-539: Fix entity type for Excel files.

Use the specific Excel enum for consistency.

Apply this diff:

-        entity: DriveEntity.Sheets,
+        entity: DriveEntity.ExcelSpreadsheet,

163-171: Consider using the retry/backoff wrapper for Graph calls.

Direct SDK call here bypasses retryWithBackoff; consider makeGraphApiCall if you can attach headers via the wrapper or extend it to support headers.

server/integrations/microsoft/sharepoint/index.ts (1)

145-152: Skip folders during ingestion.

Folder items are being ingested as files with empty chunks. Skip them to avoid noisy documents.

Apply this diff:

-          if (response.value && Array.isArray(response.value)) {
+          if (response.value && Array.isArray(response.value)) {
             for (const item of response.value) {
+              if (item.folder) {
+                continue
+              }
                 try {
server/integrations/microsoft/sync.ts (2)

1174-1179: Avoid mutating deltaLinks while iterating its keys.

Delete after collecting to prevent iteration pitfalls.

Apply this diff:

-  for (const key in deltaLinks) {
-    if (!driveSet.has(key)) {
-      deletedDrives.push(key)
-      delete deltaLinks[key]
-    }
-  }
+  const toDelete: string[] = []
+  for (const key in deltaLinks) {
+    if (!driveSet.has(key)) {
+      deletedDrives.push(key)
+      toDelete.push(key)
+    }
+  }
+  for (const key of toDelete) delete deltaLinks[key]

1358-1371: Don’t throw on “not found” during delete; log at warn level.

This isn’t exceptional for delta; reduce noise.

Apply this diff:

-    } else {
-      throw new Error("File not found in vespa")
-    }
+    } else {
+      Logger.warn(`SharePoint file ${fileId} not found in Vespa during delete`)
+    }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 4fb9bf0 and 52aaf2e.

⛔ Files ignored due to path filters (1)
  • server/bun.lockb is excluded by !**/bun.lockb
📒 Files selected for processing (4)
  • server/integrations/microsoft/sharepoint/index.ts (1 hunks)
  • server/integrations/microsoft/sync.ts (11 hunks)
  • server/integrations/microsoft/utils.ts (3 hunks)
  • server/package.json (2 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

  • server/integrations/microsoft/sharepoint/index.ts
  • server/integrations/microsoft/sync.ts
  • server/integrations/microsoft/utils.ts
🧬 Code graph analysis (3)
server/integrations/microsoft/sharepoint/index.ts (5)
server/integrations/microsoft/client.ts (2)
  • MicrosoftGraphClient (23-37)
  • makeGraphApiCall (98-112)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/integrations/tracker.ts (1)
  • Tracker (89-325)
server/integrations/microsoft/utils.ts (2)
  • getFilePermissions (278-353)
  • processFileContent (559-601)
server/search/vespa.ts (1)
  • insertWithRetry (212-212)
server/integrations/microsoft/sync.ts (9)
server/integrations/microsoft/client.ts (4)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • MicrosoftGraphClient (23-37)
  • createMicrosoftGraphClient (60-95)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (12-47)
  • discoverSiteDrives (50-110)
  • processSiteDrives (113-243)
server/utils.ts (1)
  • getErrorMessage (106-109)
server/search/vespa.ts (3)
  • getDocumentOrNull (65-65)
  • DeleteDocument (67-67)
  • insertWithRetry (212-212)
server/integrations/microsoft/utils.ts (2)
  • getFilePermissions (278-353)
  • processFileContent (559-601)
server/db/syncJob.ts (2)
  • getAppSyncJobs (35-45)
  • updateSyncJob (65-87)
server/db/connector.ts (2)
  • db (3-3)
  • getMicrosoftAuthConnectorWithCredentials (307-368)
server/types.ts (1)
  • MicrosoftServiceCredentials (462-469)
server/db/syncHistory.ts (1)
  • insertSyncHistory (10-30)
server/integrations/microsoft/utils.ts (6)
server/integrations/google/index.ts (3)
  • downloadDir (2431-2431)
  • deleteDocument (1467-1484)
  • safeLoadPDF (2479-2495)
server/integrations/google/pdf-utils.ts (3)
  • downloadDir (11-11)
  • deleteDocument (13-30)
  • safeLoadPDF (60-77)
server/integrations/microsoft/client.ts (4)
  • MicrosoftGraphClient (23-37)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • Client (249-249)
server/utils.ts (1)
  • hashPdfFilename (325-332)
server/chunks.ts (1)
  • chunkDocument (123-133)
server/docxChunks.ts (1)
  • extractTextAndImagesWithChunksFromDocx (2412-2765)
🔇 Additional comments (7)
server/package.json (1)

48-48: Blocking: @azure/identity still fails on our Bun runtime

Reiterating the outstanding concern: the server scripts (Lines 33–34) continue to execute under Bun, yet @azure/identity@^4.12.0 only supports Node.js/browsers and is known to crash/hang in Bun (AbortController, process.versions, stream APIs). Shipping this as-is will break the new SharePoint auth flows whenever they execute inside Bun. Please either switch the runtime to Node LTS (and enforce/document it) or provide the Bun-specific shims/polyfills plus full verification before landing.

server/integrations/microsoft/utils.ts (1)

25-35: Always ensure download directory exists; add fallback for production.

Currently the dir is only ensured in non-prod, risking runtime failures in prod. Use deterministic dir with fallback to OS temp.

Apply this diff:

-// Download directory setup
-export const downloadDir = path.resolve(__dirname, "../../downloads")
-
-if (process.env.NODE_ENV !== "production") {
-  const init = () => {
-    if (!fs.existsSync(downloadDir)) {
-      fs.mkdirSync(downloadDir, { recursive: true })
-    }
-  }
-  init()
-}
+// Download directory setup
+export const downloadDir = process.env.DOWNLOAD_DIR
+  ? path.resolve(process.env.DOWNLOAD_DIR)
+  : path.resolve(__dirname, "../../downloads")
+
+try {
+  if (!fs.existsSync(downloadDir)) {
+    fs.mkdirSync(downloadDir, { recursive: true })
+  }
+} catch (e) {
+  // Fallback to OS temp if primary location is not writable
+  const fallback = path.resolve(os.tmpdir(), "downloads")
+  if (!fs.existsSync(fallback)) {
+    fs.mkdirSync(fallback, { recursive: true })
+  }
+  (exports as any).downloadDir = fallback
+}

Also add this import at the top of the file:

import os from "node:os"
server/integrations/microsoft/sharepoint/index.ts (2)

210-218: Guard on the actual delta token, not the object.

Using deltaLinks (the map) is always truthy; check deltaLink instead.

Apply this diff:

-        if (deltaLinks && siteDrive.id) {
+        if (deltaLink && siteDrive.id) {
           deltaLinks[`${siteDrive.sharePointIds?.siteId}::${siteDrive.id}`] =
             deltaLink

128-141: Validate sharePointIds.siteId and drive.id before constructing delta URL.

Avoid building invalid endpoints when IDs are missing; skip and log.

Apply this diff:

-        loggerWithChild({ email: userEmail }).info(
-          `Processing drive: ${siteDrive.name} from site: ${siteDrive.name}`,
-        )
+        if (!siteDrive.sharePointIds?.siteId || !siteDrive.id) {
+          loggerWithChild({ email: userEmail }).warn(
+            `Skipping drive ${siteDrive.name} - missing sharePointIds.siteId or drive.id`,
+          )
+          continue
+        }
+        loggerWithChild({ email: userEmail }).info(
+          `Processing drive: ${siteDrive.name} from site: ${siteDrive.name}`,
+        )
server/integrations/microsoft/sync.ts (3)

1249-1250: Don’t overwrite existing delta token with an empty value.

Preserve the prior token when no new @odata.deltaLink is returned.

Apply this diff:

-      deltaLinks[driveKey] = newDeltaLink
+      if (newDeltaLink) {
+        deltaLinks[driveKey] = newDeltaLink
+      } else {
+        loggerWithChild({ email }).warn(
+          `No new deltaLink for drive ${driveKey}; preserving existing token`,
+        )
+      }

1412-1419: Remove @microsoft.graph.downloadUrl from metadata.

Same security concern as above; do not persist pre-authorized URLs.

Apply this diff:

       metadata: JSON.stringify({
         size: item.size,
-        downloadUrl: item["@microsoft.graph.downloadUrl"],
         siteId: siteId,
         driveId: driveId,
         parentId: item.parentReference?.id ?? "",
         parentPath: item.parentReference?.path ?? "/",
         eTag: item.eTag ?? "",
       }),

83-83: Remove invalid Zod import.

zod has no named export unknown. This breaks the build.

Apply this diff:

-import { unknown } from "zod"

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/integrations/microsoft/utils.ts (1)

221-232: Don’t store pre‑authorized downloadUrl in metadata.

Remove to reduce leakage risk; fetch fresh URLs when needed.

 export const formatOneDriveMetadata = (file: any): string => {
   return JSON.stringify({
     parentPath: file.parentReference?.path ?? "",
     parentName: file.parentReference?.name ?? "",
     size: file.size ?? 0,
     webUrl: file.webUrl ?? "",
-    downloadUrl: file["@microsoft.graph.downloadUrl"] ?? "",
     lastModifiedBy: file.lastModifiedBy?.user?.displayName ?? "",
     createdBy: file.createdBy?.user?.displayName ?? "",
   })
 }
🧹 Nitpick comments (5)
server/server.ts (1)

1511-1514: Queue init failure currently crashes the server. Confirm desired behavior.

Throwing InitialisationError at top-level aborts startup. If queue is non-critical, prefer logging + health flag to avoid full outage.

-initQueue().catch((error) => {
-  throw new InitialisationError({ cause: error })
-})
+initQueue().catch((error) => {
+  Logger.error(error, "Queue initialisation failed")
+  // optionally expose degraded health instead of crashing
+})
server/integrations/microsoft/sync.ts (3)

1398-1411: Use mime map for entity instead of Misc.

Improves search facets/UX.

-      entity: DriveEntity.Misc,
+      entity:
+        (item.file?.mimeType && microsoftMimeTypeMap[item.file.mimeType]) ??
+        DriveEntity.Misc,

1353-1372: Avoid throwing on “not found in Vespa” during delete.

A missing doc isn’t exceptional during eventual consistency; downgrade to warn to reduce noise.

-    } else {
-      throw new Error("File not found in vespa")
-    }
+    }

1181-1186: TODO: handle deleted drives cleanup.

Delta link removal alone leaves orphaned docs. Consider scheduled purge by driveId/siteId.

server/integrations/microsoft/utils.ts (1)

355-423: Use Logger instead of console.error for consistency.

Aligns with centralized logging.

-      console.error(`Error processing PDF ${file.name}:`, error)
+      Logger.error(error, `Error processing PDF ${file.name}`)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 52aaf2e and 5ead1fa.

⛔ Files ignored due to path filters (1)
  • server/bun.lockb is excluded by !**/bun.lockb
📒 Files selected for processing (3)
  • server/integrations/microsoft/sync.ts (13 hunks)
  • server/integrations/microsoft/utils.ts (3 hunks)
  • server/server.ts (5 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

  • server/integrations/microsoft/utils.ts
  • server/integrations/microsoft/sync.ts
🧬 Code graph analysis (3)
server/server.ts (5)
server/api/search.ts (1)
  • chatTitleSchema (140-143)
server/api/chat/chat.ts (1)
  • GenerateChatTitleApi (7231-7288)
server/types.ts (1)
  • microsoftServiceSchema (228-234)
server/api/admin.ts (1)
  • AddServiceConnectionMicrosoft (457-549)
server/errors/index.ts (1)
  • InitialisationError (99-108)
server/integrations/microsoft/utils.ts (7)
server/logger/index.ts (2)
  • getLogger (36-93)
  • Subsystem (15-15)
server/integrations/google/index.ts (3)
  • downloadDir (2431-2431)
  • deleteDocument (1467-1484)
  • safeLoadPDF (2479-2495)
server/integrations/google/pdf-utils.ts (3)
  • downloadDir (11-11)
  • deleteDocument (13-30)
  • safeLoadPDF (60-77)
server/integrations/microsoft/client.ts (4)
  • MicrosoftGraphClient (23-37)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • Client (249-249)
server/utils.ts (1)
  • hashPdfFilename (325-332)
server/chunks.ts (1)
  • chunkDocument (123-133)
server/docxChunks.ts (1)
  • extractTextAndImagesWithChunksFromDocx (2412-2765)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • MicrosoftGraphClient (23-37)
  • createMicrosoftGraphClient (60-95)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (12-47)
  • discoverSiteDrives (50-110)
  • processSiteDrives (113-243)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/utils.ts (1)
  • getErrorMessage (106-109)
server/search/vespa.ts (3)
  • getDocumentOrNull (65-65)
  • DeleteDocument (67-67)
  • insertWithRetry (212-212)
server/integrations/microsoft/utils.ts (2)
  • getFilePermissions (278-353)
  • processFileContent (559-601)
server/db/syncJob.ts (2)
  • getAppSyncJobs (35-45)
  • updateSyncJob (65-87)
server/db/connector.ts (2)
  • db (3-3)
  • getMicrosoftAuthConnectorWithCredentials (307-368)
server/types.ts (1)
  • MicrosoftServiceCredentials (462-469)
server/db/syncHistory.ts (1)
  • insertSyncHistory (10-30)
🔇 Additional comments (14)
server/server.ts (2)

797-801: LGTM: route formatting only

Refactor to multi-line handler is no-op and clearer.


1061-1065: microsoftServiceSchema likely rejects valid service-account submissions (scopes).

The derived schema still inherits superRefine from createOAuthProvider. This route will fail when scopes aren’t provided, even though handler supplies defaults later.

Apply this diff after fixing the schema (in server/types.ts) or make scopes optional for this route:

-  .post(
-    "/microsoft/service_account",
-    zValidator("form", microsoftServiceSchema),
-    AddServiceConnectionMicrosoft,
-  )
+  .post(
+    "/microsoft/service_account",
+    zValidator("form", microsoftServiceSchema), // ensure schema doesn't require scopes
+    AddServiceConnectionMicrosoft,
+  )
#!/bin/bash
# Verify microsoftServiceSchema does not enforce scopes via superRefine
rg -n -C3 'microsoftServiceSchema|createOAuthProvider|superRefine' --type=ts server/types.ts
server/integrations/microsoft/sync.ts (7)

183-195: LGTM: switched delta fetch to makeGraphApiCall with paging.

Simpler and consistent with retry/backoff wrapper.


294-316: LGTM: unified file download path uses downloadFileFromGraph(graphClient,…).

Correct MicrosoftGraphClient usage.

Also applies to: 379-386


1167-1172: Filter out drives missing siteId to avoid keys like "undefined::driveId".

Prevents malformed deltaLinks keys.

-  drives = drives.filter(
-    (drive) =>
-      drive.id &&
-      !(`${drive.sharePointIds?.siteId}::${drive.id}` in deltaLinks),
-  )
+  drives = drives.filter(
+    (drive) =>
+      drive.id &&
+      drive.sharePointIds?.siteId &&
+      !(`${drive.sharePointIds.siteId}::${drive.id}` in deltaLinks),
+  )

83-84: Remove invalid Zod import.

zod has no named export "unknown". This breaks build.

-import { unknown } from "zod"

1240-1250: Don’t overwrite existing delta token with empty value.

Graph may omit @odata.deltaLink; this wipes a valid token.

-      //update with new Link
-      deltaLinks[driveKey] = newDeltaLink
+      // update only if a new deltaLink was returned
+      if (newDeltaLink) {
+        deltaLinks[driveKey] = newDeltaLink
+      } else {
+        loggerWithChild({ email }).warn(
+          `No new deltaLink for drive ${driveKey}; preserving existing token`,
+        )
+      }

1411-1420: Do not persist @microsoft.graph.downloadUrl in metadata.

Short‑lived pre‑authorized URL; storing raises leakage risk.

       metadata: JSON.stringify({
         size: item.size,
-        downloadUrl: item["@microsoft.graph.downloadUrl"],
         siteId: siteId,
         driveId: driveId,
         parentId: item.parentReference?.id ?? "",
         parentPath: item.parentReference?.path ?? "/",
         eTag: item.eTag ?? "",
       }),

1807-1907: Verify sync_jobs.config initialization and add null guard
I couldn’t find a migration that sets a default config for MicrosoftSharepoint sync jobs. Confirm that existing records always have a valid deltaLinks field or implement a backfill, and add a runtime fallback (e.g. default to { deltaLinks: {}, lastSyncedAt: new Date(0) }) before accessing config.deltaLinks.

server/integrations/microsoft/utils.ts (5)

98-129: LGTM: Service credential provider via @azure/identity.

Works for app-only flows and exposes expiry when needed.


277-353: LGTM: Robust permission extraction across shapes.

Handles modern and legacy Graph responses; dedup via Set.


425-473: LGTM: DOCX extraction via existing docxChunks util.

Buffer is a Uint8Array; safe for the helper.


559-602: LGTM: processFileContent routes by MIME; Excel disabled safely.

Returns chunks; callers handle metadata insert/update.


25-36: Ensure download directory exists in production too.

Current guard only creates it in non-prod; disk writes will fail in prod.

-// Download directory setup
-export const downloadDir = path.resolve(__dirname, "../../downloads")
-
-if (process.env.NODE_ENV !== "production") {
-  const init = () => {
-    if (!fs.existsSync(downloadDir)) {
-      fs.mkdirSync(downloadDir, { recursive: true })
-    }
-  }
-  init()
-}
+// Download directory setup
+export const downloadDir = process.env.DOWNLOAD_DIR
+  ? path.resolve(process.env.DOWNLOAD_DIR)
+  : path.resolve(__dirname, "../../downloads")
+
+try {
+  if (!fs.existsSync(downloadDir)) {
+    fs.mkdirSync(downloadDir, { recursive: true })
+  }
+} catch (e) {
+  // Fallback to OS temp
+  const osTmp = require("node:os").tmpdir()
+  const fallback = path.resolve(osTmp, "xyne-downloads")
+  if (!fs.existsSync(fallback)) fs.mkdirSync(fallback, { recursive: true })
+  ;(exports as any).downloadDir = fallback
+  Logger.warn({ fallback }, "Using temp download directory")
+}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 0

🧹 Nitpick comments (3)
server/integrations/microsoft/sync.ts (2)

1162-1173: Avoid ‘undefined::driveId’ keys; filter drives missing siteId.

Keying deltaLinks by ${siteId}::${driveId} without ensuring siteId exists yields bad keys and collisions.

-  const driveSet = new Set(
-    drives.map((drive) => `${drive.sharePointIds?.siteId}::${drive.id}`),
-  )
+  const driveSet = new Set(
+    drives
+      .filter((d) => d.sharePointIds?.siteId && d.id)
+      .map((d) => `${d.sharePointIds!.siteId}::${d.id}`),
+  )
@@
-  drives = drives.filter(
-    (drive) =>
-      drive.id &&
-      !(`${drive.sharePointIds?.siteId}::${drive.id}` in deltaLinks),
-  )
+  drives = drives.filter(
+    (drive) =>
+      drive.id &&
+      drive.sharePointIds?.siteId &&
+      !(`${drive.sharePointIds.siteId}::${drive.id}` in deltaLinks),
+  )

1369-1373: Don’t throw when Vespa doc is already absent.

A missing doc on delete is normal; avoid noisy errors.

-    } else {
-      throw new Error("File not found in vespa")
-    }
+    } else {
+      loggerWithChild({ email }).debug(
+        `File ${fileId} not found in Vespa; nothing to delete`,
+      )
+    }
server/integrations/microsoft/index.ts (1)

73-74: Remove unused AWS Bedrock import.

It’s not referenced; avoid dead imports.

-import type { InvokeModelResponseFilterSensitiveLog } from "@aws-sdk/client-bedrock-runtime"
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 5ead1fa and f59e39c.

📒 Files selected for processing (5)
  • server/integrations/microsoft/index.ts (3 hunks)
  • server/integrations/microsoft/sharepoint/index.ts (1 hunks)
  • server/integrations/microsoft/sync.ts (14 hunks)
  • server/integrations/microsoft/utils.ts (3 hunks)
  • server/server.ts (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/integrations/microsoft/sharepoint/index.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

  • server/integrations/microsoft/sync.ts
🧬 Code graph analysis (4)
server/server.ts (2)
server/types.ts (1)
  • microsoftServiceSchema (228-234)
server/api/admin.ts (1)
  • AddServiceConnectionMicrosoft (457-549)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • MicrosoftGraphClient (23-37)
  • createMicrosoftGraphClient (60-95)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (12-47)
  • discoverSiteDrives (50-110)
  • processSiteDrives (113-246)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/utils.ts (1)
  • getErrorMessage (106-109)
server/search/vespa.ts (3)
  • getDocumentOrNull (65-65)
  • DeleteDocument (67-67)
  • insertWithRetry (212-212)
server/integrations/microsoft/utils.ts (2)
  • getFilePermissionsSharepoint (278-318)
  • processFileContent (600-642)
server/db/syncJob.ts (2)
  • getAppSyncJobs (35-45)
  • updateSyncJob (65-87)
server/db/connector.ts (2)
  • db (3-3)
  • getMicrosoftAuthConnectorWithCredentials (307-368)
server/types.ts (1)
  • MicrosoftServiceCredentials (462-469)
server/db/syncHistory.ts (1)
  • insertSyncHistory (10-30)
server/integrations/microsoft/index.ts (6)
server/db/schema/connectors.ts (2)
  • SelectConnector (146-146)
  • connectors (60-109)
server/integrations/tracker.ts (1)
  • Tracker (89-325)
server/types.ts (1)
  • MicrosoftServiceCredentials (462-469)
server/integrations/microsoft/client.ts (1)
  • createMicrosoftGraphClient (60-95)
server/integrations/metricStream.ts (2)
  • sendWebsocketMessage (11-16)
  • closeWs (5-7)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (12-47)
  • discoverSiteDrives (50-110)
  • processSiteDrives (113-246)
server/integrations/microsoft/utils.ts (5)
server/integrations/google/pdf-utils.ts (3)
  • downloadDir (11-11)
  • deleteDocument (13-30)
  • safeLoadPDF (60-77)
server/integrations/microsoft/client.ts (4)
  • MicrosoftGraphClient (23-37)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • Client (249-249)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/utils.ts (1)
  • hashPdfFilename (325-332)
server/docxChunks.ts (1)
  • extractTextAndImagesWithChunksFromDocx (2412-2765)
🔇 Additional comments (16)
server/server.ts (4)

45-45: LGTM: schema import

Importing microsoftServiceSchema here is correct and scoped.


79-80: LGTM: handler import

AddServiceConnectionMicrosoft import aligns with new route below.


797-801: LGTM: no functional change

Multiline formatting only; route behavior unchanged.


1061-1065: microsoftServiceSchema likely rejects valid service-account forms (superRefine scopes).

microsoftServiceSchema derives from createOAuthProvider.omit({ scopes: true }). If createOAuthProvider’s superRefine still enforces scopes when isUsingGlobalCred is false, this route will 400 on valid service-account submissions without scopes.

Apply one of:

  • Make scopes optional/ignored for service-account schema (or pass isUsingGlobalCred=true).
  • Create a separate base schema without the scopes superRefine and use that for service accounts.

Would you like a targeted patch in server/types.ts to conditionally relax scopes for service accounts?

server/integrations/microsoft/sync.ts (7)

185-195: LGTM: Graph delta pagination

Switching to makeGraphApiCall with pagination is correct and consistent with client helpers.


300-301: LGTM: unified file download helper

Using downloadFileFromGraph for DOCX/XLSX content fetch simplifies endpoints and error handling.

Also applies to: 317-318


1811-1911: LGTM: service-account SharePoint change loop

Client creation, deltaLinks in-place update, and sync history writes look correct. Ensure upstream fix for deltaLink guarding (above) lands before enabling.

Would you like a dry-run script to print current SharePoint sync jobs’ deltaLinks and confirm they remain non-empty after a run?


83-83: Invalid import: zod has no named export ‘unknown’.

This breaks build. Use z.unknown() via import { z } from "zod" if needed, otherwise remove.

-import { unknown } from "zod"
+// removed invalid import; use `import { z } from "zod"` and `z.unknown()` if needed

Based on learnings


1250-1252: Don’t overwrite a valid delta token with an empty value.

If Graph omits @odata.deltaLink, you wipe a good token.

-      //update with new Link
-      deltaLinks[driveKey] = newDeltaLink
+      // Update only if a new deltaLink was returned
+      if (newDeltaLink) {
+        deltaLinks[driveKey] = newDeltaLink
+      } else {
+        loggerWithChild({ email }).warn(
+          `No new deltaLink for drive ${driveKey}; preserving existing token`,
+        )
+      }

1297-1299: Guard substring call on possibly undefined nextLink.

substring on undefined will throw.

-    loggerWithChild({ email }).info(
-      `Fetching delta changes from: ${nextLink.substring(0, 100)}...`,
-    )
+    loggerWithChild({ email }).info(
+      `Fetching delta changes from: ${String(nextLink ?? "").substring(0, 100)}...`,
+    )

1413-1421: Remove @microsoft.graph.downloadUrl from persisted metadata.

It’s a short‑lived pre‑authorized URL; storing increases leakage risk.

       metadata: JSON.stringify({
         size: item.size,
-        downloadUrl: item["@microsoft.graph.downloadUrl"],
         siteId: siteId,
         driveId: driveId,
         parentId: item.parentReference?.id ?? "",
         parentPath: item.parentReference?.path ?? "/",
         eTag: item.eTag ?? "",
       }),
server/integrations/microsoft/index.ts (2)

817-924: LGTM: service-account ingestion orchestration

Discovery -> drive processing -> deltaLinks capture -> sync job write is sound. WebSocket progress envelope consistent with metricStream.


832-837: Fix createMicrosoftGraphClient argument order (tenantId is 5th optional param).

Passing tenantId as 4th sets it as refreshToken; tenantId is then lost.

-    const graphClient = createMicrosoftGraphClient(
-      credentials.access_token,
-      credentials.clientId,
-      credentials.clientSecret,
-      credentials.tenantId,
-    )
+    const graphClient = createMicrosoftGraphClient(
+      credentials.access_token,
+      credentials.clientId,
+      credentials.clientSecret,
+      undefined,
+      credentials.tenantId,
+    )
#!/bin/bash
# Find all call sites and show argument lists to confirm ordering
rg -nPU --type=ts '(?s)createMicrosoftGraphClient\s*\((.*?)\)' -C2
server/integrations/microsoft/utils.ts (3)

278-318: LGTM: SharePoint permissions extraction

Robustly handles grantedToV2 and parses emails from loginName; dedup via Set is good.


600-642: LGTM: content extraction wrapper

Clear type routing; safe fallbacks; avoids processing unsupported types.


26-36: Ensure download directory exists in production (and allow override).

Current code only creates the dir in non‑prod; writes will fail in prod.

-// Download directory setup
-export const downloadDir = path.resolve(__dirname, "../../downloads")
-
-if (process.env.NODE_ENV !== "production") {
-  const init = () => {
-    if (!fs.existsSync(downloadDir)) {
-      fs.mkdirSync(downloadDir, { recursive: true })
-    }
-  }
-  init()
-}
+// Download directory setup
+export const downloadDir = process.env.DOWNLOAD_DIR
+  ? path.resolve(process.env.DOWNLOAD_DIR)
+  : path.resolve(__dirname, "../../downloads")
+
+if (!fs.existsSync(downloadDir)) {
+  fs.mkdirSync(downloadDir, { recursive: true })
+}

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🧹 Nitpick comments (2)
server/api/admin.ts (1)

520-521: Defer “Connected” status until ingestion completes

Set to NotConnected (or an “InProgress” equivalent) here; the ingestion flow already sets Connected on success. Avoids showing a false-positive connected state if ingestion fails.

-      ConnectorStatus.Connected, // Set as connected since we validated the connection
+      ConnectorStatus.NotConnected,
server/integrations/microsoft/sync.ts (1)

1182-1184: Implement cleanup for deleted drives

There’s a TODO to remove files for deleted drives. Leaving stale docs will bloat index and cause stale search results.

I can draft a helper deleteSharePointDriveDocuments(siteId::driveId) that queries by metadata and bulk-deletes matching docs. Want me to add it?

📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between b62bb51 and cd2c67c.

📒 Files selected for processing (3)
  • server/api/admin.ts (4 hunks)
  • server/integrations/microsoft/sync.ts (14 hunks)
  • server/utils.ts (4 hunks)
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

  • server/integrations/microsoft/sync.ts
  • server/utils.ts
🧬 Code graph analysis (3)
server/api/admin.ts (8)
server/integrations/microsoft/index.ts (2)
  • loggerWithChild (82-84)
  • handleMicrosoftServiceAccountIngestion (817-923)
server/db/user.ts (1)
  • getUserByEmail (147-156)
server/db/connector.ts (2)
  • db (3-3)
  • insertConnector (43-92)
server/errors/index.ts (3)
  • NoUserFound (322-327)
  • ConnectorNotCreated (338-343)
  • AddServiceConnectionError (330-335)
server/types.ts (2)
  • microsoftService (296-296)
  • MicrosoftServiceCredentials (462-469)
server/integrations/microsoft/utils.ts (1)
  • CustomServiceAuthProvider (100-130)
server/db/schema/connectors.ts (1)
  • selectConnectorSchema (135-139)
server/utils.ts (1)
  • getErrorMessage (106-109)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • MicrosoftGraphClient (23-37)
  • createMicrosoftGraphClient (60-95)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (12-47)
  • discoverSiteDrives (50-110)
  • processSiteDrives (113-246)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/utils.ts (1)
  • getErrorMessage (106-109)
server/search/vespa.ts (3)
  • getDocumentOrNull (65-65)
  • DeleteDocument (67-67)
  • insertWithRetry (212-212)
server/integrations/microsoft/utils.ts (2)
  • getFilePermissionsSharepoint (278-318)
  • processFileContent (600-642)
server/db/syncJob.ts (2)
  • getAppSyncJobs (35-45)
  • updateSyncJob (65-87)
server/db/connector.ts (2)
  • db (3-3)
  • getMicrosoftAuthConnectorWithCredentials (307-368)
server/types.ts (1)
  • MicrosoftServiceCredentials (462-469)
server/db/syncHistory.ts (1)
  • insertSyncHistory (10-30)
server/utils.ts (3)
server/integrations/microsoft/client.ts (2)
  • MicrosoftGraphClient (23-37)
  • updateMicrosoftGraphClient (39-57)
server/integrations/microsoft/config.ts (1)
  • scopes (2-5)
server/integrations/microsoft/utils.ts (1)
  • CustomServiceAuthProvider (100-130)
🔇 Additional comments (11)
server/api/admin.ts (3)

497-517: Confirm secrets are encrypted at rest

You serialize clientSecret into credentials. Ensure the connectors.credentials column is transparently encrypted (app-layer or DB-layer). If not, add encryption before insert.

Would you like a patch to wrap credential serialization with the project’s encryption helper?


475-476: Replace ts-ignore with a type assertion for safety

Use a typed cast instead of suppressing types.

-  // @ts-ignore
-  const form: microsoftService = c.req.valid("form")
+  const form = c.req.valid("form") as microsoftService

528-528: Handle the background promise to avoid unhandled rejections

Add a .catch() (or await in try/catch) on the ingestion call.

-    handleMicrosoftServiceAccountIngestion(email, connector)
+    handleMicrosoftServiceAccountIngestion(email, connector).catch((error) => {
+      loggerWithChild({ email }).error(
+        error,
+        `Background Microsoft service account ingestion failed for connector ${connector.id}: ${getErrorMessage(error)}`,
+      )
+    })
server/utils.ts (3)

213-219: Good fix: robust Google token refresh

Null-safe handling of refreshAccessToken avoids runtime crashes.


220-256: Microsoft token refresh flow is sensible

Branching between delegated (refresh token) and service principal (client credentials) and updating the Graph client in-place looks correct.

If you’ve seen Graph returning 401 with code other than InvalidAuthenticationToken, consider broadening the condition to include statusCode 401 on GraphError.


315-319: Expanded Microsoft app detection LGTM

Including MicrosoftSharepoint ensures retry/backoff paths trigger for SharePoint.

server/integrations/microsoft/sync.ts (5)

1168-1172: Filter out drives missing siteId before keying deltaLinks

Prevents keys like "undefined::driveId".

-  drives = drives.filter(
-    (drive) =>
-      drive.id &&
-      !(`${drive.sharePointIds?.siteId}::${drive.id}` in deltaLinks),
-  )
+  drives = drives.filter(
+    (drive) =>
+      drive.id &&
+      drive.sharePointIds?.siteId &&
+      !(`${drive.sharePointIds.siteId}::${drive.id}` in deltaLinks),
+  )

1249-1251: Don’t overwrite an existing delta token with an empty value

Only update when a new token is returned.

-      //update with new Link
-      deltaLinks[driveKey] = newDeltaLink
+      // Update only if a new deltaLink was returned
+      if (newDeltaLink) {
+        deltaLinks[driveKey] = newDeltaLink
+      } else {
+        loggerWithChild({ email }).warn(
+          `No new deltaLink for drive ${driveKey}; preserving existing token`,
+        )
+      }

1830-1836: Correct createMicrosoftGraphClient argument order

Service-account client is constructed with (accessToken, clientId, clientSecret, undefined, tenantId) — matches the signature.

Ensure other call sites also follow: createMicrosoftGraphClient(accessToken, clientId, clientSecret, refreshToken?, tenantId?).


1296-1299: Guard substring on possibly undefined nextLink

Avoids runtime error when nextLink is falsy.

-    loggerWithChild({ email }).info(
-      `Fetching delta changes from: ${nextLink.substring(0, 100)}...`,
-    )
+    loggerWithChild({ email }).info(
+      `Fetching delta changes from: ${String(nextLink ?? "").substring(0, 100)}...`,
+    )

1411-1420: Don’t persist @microsoft.graph.downloadUrl in metadata

It’s short‑lived and pre‑authorized; storing increases leakage risk.

       metadata: JSON.stringify({
         size: item.size,
-        downloadUrl: item["@microsoft.graph.downloadUrl"],
         siteId: siteId,
         driveId: driveId,
         parentId: item.parentReference?.id ?? "",
         parentPath: item.parentReference?.path ?? "/",
         eTag: item.eTag ?? "",
       }),

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/integrations/microsoft/utils.ts (1)

224-233: Avoid embedding @microsoft.graph.downloadUrl in metadata.

Same risk as in SharePoint path — short‑lived pre‑authorized URL.

Apply:

   return JSON.stringify({
     parentPath: file.parentReference?.path ?? "",
     parentName: file.parentReference?.name ?? "",
     size: file.size ?? 0,
     webUrl: file.webUrl ?? "",
-    downloadUrl: file["@microsoft.graph.downloadUrl"] ?? "",
     lastModifiedBy: file.lastModifiedBy?.user?.displayName ?? "",
     createdBy: file.createdBy?.user?.displayName ?? "",
   })
🧹 Nitpick comments (11)
server/integrations/microsoft/sync.ts (4)

1409-1414: Use correct entity from MIME type instead of Misc.

Improves search/UX and consistency with OneDrive path.

Apply:

-      entity: DriveEntity.Misc,
+      entity: getEntityFromMimeType(item.file?.mimeType),

And add the import (see suggested change for lines 62–67 below).


62-67: Import getEntityFromMimeType to classify SharePoint files properly.

Apply:

 import {
   getFilePermissionsSharepoint,
   MicrosoftMimeType,
   microsoftMimeTypeMap,
   processFileContent,
   type OneDriveFile,
+  getEntityFromMimeType,
 } from "./utils"

1249-1253: Preserve existing delta token if no new token returned.

Currently updates only when non-empty; log when preserved to aid debugging.

Apply:

-      if (newDeltaLink !== "") {
-        //update with new Link
-        deltaLinks[driveKey] = newDeltaLink
-      }
+      if (newDeltaLink) {
+        deltaLinks[driveKey] = newDeltaLink
+      } else {
+        loggerWithChild({ email }).warn(
+          `No new deltaLink for drive ${driveKey}; preserving existing token`,
+        )
+      }

1370-1374: Don't throw for “not found in Vespa” on delete path.

Deletion events can arrive after cleanup; skip throwing to reduce noisy error logs.

Apply:

-    } else {
-      throw new Error("File not found in vespa")
-    }
+    }
server/utils.ts (1)

199-202: Expand 401 detection for Microsoft/Graph.

Graph errors often use status/statusCode; include those for reliability.

Apply:

-      (error.code === 401 || error.code === "InvalidAuthenticationToken") &&
+      (error.code === 401 ||
+        error.status === 401 ||
+        error.statusCode === 401 ||
+        error.code === "InvalidAuthenticationToken") &&
server/integrations/microsoft/utils.ts (2)

38-51: Rename deleteDocument helper to avoid confusion with Vespa DeleteDocument.

Same name as search/vespa.DeleteDocument; can confuse readers and tooling.

Consider renaming to deleteTempFile and updating call sites.


455-456: Use Logger instead of console.error for consistency.

Adopt structured logging for errors.

Replace console.error with Logger.error(error, '...') in these blocks.

Also applies to: 508-509, 593-594

server/db/connector.ts (1)

321-325: Use a more accurate error type name for service-account connectors.

Throwing NoOauthConnectorFound for ServiceAccount is misleading.

Consider introducing NoServiceConnectorFound or reusing a neutral NotFound error.

server/api/admin.ts (3)

519-521: Set initial connector status to Connecting; flip to Connected after ingestion.

Prevents UI showing “Connected” while heavy ingestion runs.

Apply:

-      ConnectorStatus.Connected, // Set as connected since we validated the connection
+      ConnectorStatus.Connecting,

And let the ingestion flow set Connected on success (it already updates status in ingestion).


527-528: Avoid blocking the request on ingestion; run in background with error handling.

Ingestion can be long‑running; prefer fire‑and‑forget with a catch.

Apply:

-    await handleMicrosoftServiceAccountIngestion(email, connector)
+    void handleMicrosoftServiceAccountIngestion(email, connector).catch(
+      (error) => {
+        loggerWithChild({ email: sub }).error(
+          error,
+          `Background Microsoft service account ingestion failed for connector ${connector.id}: ${getErrorMessage(error)}`,
+        )
+      },
+    )

474-475: Replace ts-ignore with a type assertion for form.

Keeps type safety without suppressing errors.

Apply:

-  // @ts-ignore
-  const form: microsoftService = c.req.valid("form")
+  const form = c.req.valid("form") as microsoftService
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between cd2c67c and f47d2a4.

📒 Files selected for processing (9)
  • frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (5 hunks)
  • server/api/admin.ts (4 hunks)
  • server/db/connector.ts (5 hunks)
  • server/integrations/microsoft/sharepoint/index.ts (1 hunks)
  • server/integrations/microsoft/sync.ts (14 hunks)
  • server/integrations/microsoft/utils.ts (3 hunks)
  • server/package.json (3 hunks)
  • server/types.ts (5 hunks)
  • server/utils.ts (4 hunks)
🚧 Files skipped from review as they are similar to previous changes (3)
  • server/package.json
  • server/integrations/microsoft/sharepoint/index.ts
  • server/types.ts
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

  • server/integrations/microsoft/sync.ts
🧬 Code graph analysis (6)
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (2)
frontend/src/api.ts (1)
  • api (4-4)
server/shared/types.ts (1)
  • Apps (38-38)
server/integrations/microsoft/utils.ts (5)
server/integrations/microsoft/client.ts (4)
  • MicrosoftGraphClient (23-37)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • Client (249-249)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/utils.ts (1)
  • hashPdfFilename (328-335)
server/chunks.ts (1)
  • chunkDocument (123-133)
server/docxChunks.ts (1)
  • extractTextAndImagesWithChunksFromDocx (2412-2765)
server/utils.ts (3)
server/integrations/microsoft/client.ts (2)
  • MicrosoftGraphClient (23-37)
  • updateMicrosoftGraphClient (39-57)
server/integrations/microsoft/config.ts (1)
  • scopes (2-5)
server/integrations/microsoft/utils.ts (1)
  • CustomServiceAuthProvider (100-130)
server/api/admin.ts (8)
server/integrations/microsoft/index.ts (2)
  • loggerWithChild (82-84)
  • handleMicrosoftServiceAccountIngestion (817-923)
server/db/user.ts (1)
  • getUserByEmail (147-156)
server/db/connector.ts (2)
  • db (3-3)
  • insertConnector (42-91)
server/errors/index.ts (3)
  • NoUserFound (322-327)
  • ConnectorNotCreated (338-343)
  • AddServiceConnectionError (330-335)
server/types.ts (2)
  • microsoftService (295-295)
  • MicrosoftServiceCredentials (461-468)
server/integrations/microsoft/utils.ts (1)
  • CustomServiceAuthProvider (100-130)
server/db/schema/connectors.ts (1)
  • selectConnectorSchema (135-139)
server/utils.ts (1)
  • getErrorMessage (103-106)
server/integrations/microsoft/sync.ts (10)
server/integrations/microsoft/client.ts (4)
  • makeGraphApiCall (98-112)
  • downloadFileFromGraph (193-209)
  • MicrosoftGraphClient (23-37)
  • createMicrosoftGraphClient (60-95)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (12-47)
  • discoverSiteDrives (50-110)
  • processSiteDrives (113-252)
server/integrations/microsoft/index.ts (1)
  • loggerWithChild (82-84)
server/utils.ts (1)
  • getErrorMessage (103-106)
server/search/vespa.ts (3)
  • getDocumentOrNull (65-65)
  • DeleteDocument (67-67)
  • insertWithRetry (212-212)
server/integrations/microsoft/utils.ts (2)
  • getFilePermissionsSharepoint (278-318)
  • processFileContent (601-644)
server/db/syncJob.ts (2)
  • getAppSyncJobs (35-45)
  • updateSyncJob (65-87)
server/db/connector.ts (2)
  • db (3-3)
  • getMicrosoftAuthConnectorWithCredentials (306-367)
server/types.ts (1)
  • MicrosoftServiceCredentials (461-468)
server/db/syncHistory.ts (1)
  • insertSyncHistory (10-30)
server/db/connector.ts (5)
server/utils.ts (2)
  • IsGoogleApp (303-310)
  • IsMicrosoftApp (312-319)
server/types.ts (3)
  • OAuthCredentials (312-318)
  • TxnOrClient (310-310)
  • MicrosoftServiceCredentials (461-468)
server/db/schema/connectors.ts (3)
  • SelectConnector (146-146)
  • connectors (60-109)
  • selectConnectorSchema (135-139)
server/errors/index.ts (2)
  • NoOauthConnectorFound (290-295)
  • MissingOauthConnectorCredentialsError (262-271)
server/integrations/microsoft/utils.ts (1)
  • CustomServiceAuthProvider (100-130)
⏰ Context from checks skipped due to timeout of 90000ms. You can increase the timeout in your CodeRabbit configuration to a maximum of 15 minutes (900000ms). (1)
  • GitHub Check: build
🔇 Additional comments (9)
frontend/src/routes/_authenticated/admin/integrations/microsoft.tsx (1)

61-62: Fix scopes payload: drop empty entries

scopes: [value.scopes, ""] still pushes an empty scope to the provider, which breaks delegated auth. Please send only non-empty scope strings.

-          scopes: [value.scopes, ""],
+          scopes: value.scopes
+            .split(/[,\s]+/)
+            .map((scope) => scope.trim())
+            .filter(Boolean),
server/integrations/microsoft/sync.ts (3)

1298-1301: Guard nextLink before substring.

If nextLink is empty/undefined, substring throws.

Apply:

-    loggerWithChild({ email }).info(
-      `Fetching delta changes from: ${nextLink.substring(0, 100)}...`,
-    )
+    loggerWithChild({ email }).info(
+      `Fetching delta changes from: ${String(nextLink ?? "").substring(0, 100)}...`,
+    )

1167-1172: Filter out drives missing siteId before initial ingestion.

Avoid keys like "undefined::driveId".

Apply:

   drives = drives.filter(
     (drive) =>
-      drive.id &&
-      !(`${drive.sharePointIds?.siteId}::${drive.id}` in deltaLinks),
+      drive.id &&
+      drive.sharePointIds?.siteId &&
+      !(`${drive.sharePointIds.siteId}::${drive.id}` in deltaLinks),
   )

1414-1422: Do not persist @microsoft.graph.downloadUrl in metadata.

Pre-authorized, short‑lived URL; storing increases leakage risk and has little value. Remove it.

Apply:

       metadata: JSON.stringify({
         size: item.size,
-        downloadUrl: item["@microsoft.graph.downloadUrl"],
         siteId: siteId,
         driveId: driveId,
         parentId: item.parentReference?.id ?? "",
         parentPath: item.parentReference?.path ?? "/",
         eTag: item.eTag ?? "",
       }),
server/integrations/microsoft/utils.ts (2)

26-36: Ensure download directory exists in all environments.

Only creating it in non-prod breaks PDF processing in prod.

Apply:

-// Download directory setup
-export const downloadDir = path.resolve(__dirname, "../../downloads")
-
-if (process.env.NODE_ENV !== "production") {
-  const init = () => {
-    if (!fs.existsSync(downloadDir)) {
-      fs.mkdirSync(downloadDir, { recursive: true })
-    }
-  }
-  init()
-}
+// Download directory setup
+export const downloadDir = process.env.DOWNLOAD_DIR
+  ? path.resolve(process.env.DOWNLOAD_DIR)
+  : path.resolve(__dirname, "../../downloads")
+
+if (!fs.existsSync(downloadDir)) {
+  fs.mkdirSync(downloadDir, { recursive: true })
+}

278-318: Permissions parsing logic looks solid.

Covers modern and legacy shapes, dedupes, and handles fallbacks.

server/db/connector.ts (1)

306-367: Service-account token refresh and persistence LGTM.

Expired token refresh via ClientSecretCredential and writing updated credentials back to DB is correct.

server/api/admin.ts (1)

132-136: Remove unused Azure/Microsoft Graph imports.

Not used; keep surface minimal.

Apply:

-import { ClientSecretCredential } from "@azure/identity"
-import { Client as GraphClient } from "@microsoft/microsoft-graph-client"
-import type { AuthenticationProvider } from "@microsoft/microsoft-graph-client"
server/utils.ts (1)

199-207: Fix unsafe non-null assertion in Google token refresh.

refreshAccessToken can fail; current code can throw at runtime.

Apply:

-      if (authClient instanceof OAuth2Client) {
+      if (authClient instanceof OAuth2Client) {
         if (IsGoogleApp(app)) {
           Logger.info(`401 encountered, refreshing OAuth access token...`)
-          const { credentials } = await authClient?.refreshAccessToken()!
-          authClient?.setCredentials(credentials)
+          const refreshResult = await authClient.refreshAccessToken()
+          if (!refreshResult?.credentials) {
+            throw new Error("Failed to refresh Google OAuth access token")
+          }
+          const { credentials } = refreshResult
+          authClient.setCredentials(credentials)
           return retryWithBackoff(
             fn,
             context,
             app,
             retries + 1,
             authClient,
           )

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (6)
server/integrations/microsoft/client.ts (4)

21-36: getCurrentTokens returns stale values after token refresh

getCurrentTokens closes over constructor arguments; updateMicrosoftGraphClient mutates fields but getCurrentTokens keeps returning the original tokens. Read from object fields instead.

Apply these diffs to (a) add expiresAt to the interface, (b) have getCurrentTokens read current fields:

 export interface MicrosoftGraphClient {
   client: Client
   accessToken: string
+  expiresAt?: Date
   refreshToken?: string // Only for delegated clients
   tenantId?: string // Only for service clients
   clientId: string
   clientSecret: string
   betaClient: Client
   // Helper methods to get updated tokens after refresh
   getCurrentTokens(): {
     accessToken: string
     refreshToken?: string
     expiresAt?: Date
   }
 }
 export const createMicrosoftGraphClient = (
   accessToken: string,
   clientId: string,
   clientSecret: string,
   refreshToken?: string,
   tenantId?: string,
   tokenExpiresAt?: Date,
 ): MicrosoftGraphClient => {
   const authProvider = new CustomAuthProvider(accessToken)

   const client = Client.initWithMiddleware({
     authProvider,
     defaultVersion: "v1.0",
   })
   const betaClient = Client.initWithMiddleware({
     authProvider,
     defaultVersion: "beta",
   })
 
-  return {
-    client,
-    accessToken,
-    refreshToken,
-    tenantId,
-    clientId,
-    clientSecret,
-    betaClient,
-    getCurrentTokens() {
-      return {
-        accessToken,
-        refreshToken,
-        expiresAt: tokenExpiresAt,
-      }
-    },
-  }
+  const graph: MicrosoftGraphClient = {
+    client,
+    betaClient,
+    accessToken,
+    refreshToken,
+    tenantId,
+    clientId,
+    clientSecret,
+    expiresAt: tokenExpiresAt,
+    getCurrentTokens() {
+      return {
+        accessToken: graph.accessToken,
+        refreshToken: graph.refreshToken,
+        expiresAt: graph.expiresAt,
+      }
+    },
+  }
+  return graph
 }

96-111: Normalize absolute Graph URLs (supports @odata.nextLink/@odata.deltaLink)

makeGraphApiCall fails when endpoint is an absolute URL. Normalize like makePagedGraphApiCall does.

 export const makeGraphApiCall = async (
   graphClient: MicrosoftGraphClient,
   endpoint: string,
 ): Promise<any> => {
   return retryWithBackoff(
     async () => {
-      const result = await graphClient.client.api(endpoint).get()
+      let path = endpoint
+      if (endpoint.startsWith("http")) {
+        const url = new URL(endpoint)
+        path = url.pathname + url.search
+      }
+      const result = await graphClient.client.api(path).get()
       return result
     },
     `Making Microsoft Graph API call to ${endpoint}`,
     Apps.MicrosoftDrive,
     1,
     graphClient,
   )
 }

112-127: Same absolute-URL issue for beta client

Mirror the normalization for makeBetaGraphApiCall.

   return retryWithBackoff(
     async () => {
-      const result = await graphClient.betaClient.api(endpoint).get(options)
+      let path = endpoint
+      if (endpoint.startsWith("http")) {
+        const url = new URL(endpoint)
+        path = url.pathname + url.search
+      }
+      const result = await graphClient.betaClient.api(path).get(options)
       return result
     },

128-151: Handle absolute URLs and keep headers on normalized requests

Normalize endpoint and keep custom headers on the request.

   return retryWithBackoff(
     async () => {
-      const request = graphClient.client.api(endpoint)
+      let path = endpoint
+      if (endpoint.startsWith("http")) {
+        const url = new URL(endpoint)
+        path = url.pathname + url.search
+      }
+      const request = graphClient.client.api(path)
 
       // Add custom headers
       Object.entries(headers).forEach(([key, value]) => {
         request.header(key, value)
       })
 
       const result = await request.get(options)
       return result
server/integrations/microsoft/sync.ts (1)

741-748: Avoid calling Graph with empty endpoint for DeletedItems

deletedItemsEndpoint defaults to "", leading to a request to the service root. Guard or start with the proper delta URL only if available.

-      const deletedItemsResponse = await makeGraphApiCall(
-        client,
-        deletedItemsEndpoint,
-      )
+      if (!deletedItemsEndpoint) {
+        Logger.info("Skipping DeletedItems delta: no token yet")
+        return { newDeltaTokens, stats, changesExist }
+      }
+      const deletedItemsResponse = await makeGraphApiCall(client, deletedItemsEndpoint)
server/integrations/microsoft/index.ts (1)

951-957: Critical: Incorrect argument order for createMicrosoftGraphClient in OAuth flow
Function signature is (accessToken, clientId, clientSecret, refreshToken?), but the code passes refresh_token as the second argument and shifts clientId/clientSecret, which breaks token refreshing.

  • In server/integrations/microsoft/index.ts (lines 951–957), swap the 2nd–4th parameters:
- createMicrosoftGraphClient(
-   oauthTokens.access_token,
-   oauthTokens.refresh_token,
-   microsoftProvider.clientId!,
-   microsoftProvider.clientSecret! as string,
- )
+ createMicrosoftGraphClient(
+   oauthTokens.access_token,
+   microsoftProvider.clientId!,
+   microsoftProvider.clientSecret! as string,
+   oauthTokens.refresh_token
+)
  • Likewise, update the usage example in docs/microsoft-integration/README.md to pass (accessToken, clientId, clientSecret, refreshToken).
🧹 Nitpick comments (16)
server/integrations/microsoft/client.ts (1)

153-189: Parity: pass options on absolute @odata.nextLink branch

Pass options in the absolute-URL branch for consistency.

-          return await graphClient.client.api(path).get()
+          return await graphClient.client.api(path).get(options)
server/integrations/microsoft/sharepoint/index.ts (2)

146-151: Absolute nextLink handling depends on client fix

@odata.nextLink will be absolute; with the proposed client helper normalization, this loop will work with both relative and absolute endpoints. If you don’t adopt that change, normalize here.


135-137: Improve log context

Log siteId instead of repeating the drive name twice.

-        loggerWithChild({ email: userEmail }).info(
-          `Processing drive: ${siteDrive.name} from site: ${siteDrive.name}`,
-        )
+        loggerWithChild({ email: userEmail }).info(
+          `Processing drive: ${siteDrive.name} from siteId: ${siteDrive.sharePointIds.siteId}`,
+        )
server/integrations/microsoft/sync.ts (3)

166-196: Absolute URL endpoints in delta flows

getOneDriveDelta may pass absolute URLs to makeGraphApiCall. With the proposed normalization in client helpers, this becomes safe. Without it, normalize here before calling.


1409-1411: Use mime-type mapping for entity instead of hardcoding Misc

Align with other paths and improve result fidelity.

-      entity: DriveEntity.Misc,
+      entity: getEntityFromMimeType(item.file?.mimeType),

Additionally, import it:

-import {
+import {
   getFilePermissionsSharepoint,
   MicrosoftMimeType,
   microsoftMimeTypeMap,
   processFileContent,
   loggerWithChild,
+  getEntityFromMimeType,
   type OneDriveFile,
 } from "./utils"

347-355: Use retry/backoff wrapper for text extraction requests

This direct client call bypasses retry/backoff and absolute-URL normalization patterns. Use the header-capable helper.

-        const contentResponse = await graphClient.client
-          .api(`/me/drive/items/${file.id}/content`)
-          .header("Accept", "text/plain")
-          .get()
+        const contentResponse = await makeGraphApiCallWithHeaders(
+          graphClient,
+          `/me/drive/items/${file.id}/content`,
+          { Accept: "text/plain" },
+        )
server/integrations/microsoft/index.ts (4)

835-845: Progress interval cleanup: stop using fixed 8s timeout; clear in finally.

The setTimeout-based clearInterval after 8s can stop updates mid-run or keep running after completion. Clear the interval deterministically on success and on error.

Apply this diff:

-    const interval = setInterval(() => {
+    const interval = setInterval(() => {
       sendWebsocketMessage(
         JSON.stringify({
           progress: tracker.getProgress(),
           userStats: tracker.getServiceAccountProgress().userStats,
           startTime: tracker.getStartTime(),
         }),
         connector!.externalId,
       )
     }, 4000)
-
-    // const driveTokens = {}
-    setTimeout(() => {
-      clearInterval(interval)
-    }, 8000)
+    // Ensure cleanup happens regardless of outcome

And add clears on both success and error paths:

       await insertSyncJob(trx, {
         ...
       })
 
       loggerWithChild({ email: email! }).info(
         `Microsoft SharePoint service account ingestion completed (jobId: ${jobId})`,
       )
-      closeWs(connector!.externalId)
+      clearInterval(interval)
+      closeWs(connector!.externalId)
   } catch (error) {
     loggerWithChild({ email: email! }).error(
       error,
       `handleMicrosoftServiceAccountIngestion (jobId: ${jobId}) failed: ${(error as Error).message}`,
     )
 
     if (connector) {
       await db
         .update(connectors)
         .set({
           status: ConnectorStatus.Failed,
         })
         .where(eq(connectors.id, connector.id))
-      closeWs(connector.externalId)
+      clearInterval(interval)
+      closeWs(connector.externalId)
     }

Also note: Tracker.getProgress() for Microsoft returns 0 by design; consider emitting a computed percent from tracker.userStats to avoid a “stuck at 0%” UI.

Also applies to: 861-864


940-950: Same interval cleanup issue in OAuth ingestion.

Use finally-style cleanup instead of a fixed 8s timeout to avoid stale or prematurely stopped updates.

Apply this diff:

-  const interval = setInterval(() => {
+  const interval = setInterval(() => {
     sendWebsocketMessage(
       JSON.stringify({
         progress: tracker.getProgress(),
         userStats: tracker.getOAuthProgress().userStats,
         startTime: tracker.getStartTime(),
       }),
       connector.externalId,
     )
   }, 4000)
-  setTimeout(() => {
-    clearInterval(interval)
-  }, 8000)
+  // On success:
+  // clearInterval(interval) just before closeWs(connector.externalId)
+  // On error:
+  // clearInterval(interval) before marking Failed and closing WS

Also applies to: 987-990


853-858: Heads-up: N+1 Graph calls for permissions can be expensive.

processSiteDrives fetches item permissions per file, which can be a major latency and throttling hotspot for large drives. Consider:

  • Batch requests or defer per-item permission expansion to a follow-up pass.
  • Limit concurrency with a queue (e.g., p-limit) and exponential backoff on 429s.
  • Cache group membership lookups if you expand groups.

68-74: Remove unused imports
Both Client and InvokeModelResponseFilterSensitiveLog are not used in this file; please remove them.

--- a/server/integrations/microsoft/index.ts
@@ lines 68,73
- import { Client } from "@microsoft/microsoft-graph-client"
- import type { InvokeModelResponseFilterSensitiveLog } from "@aws-sdk/client-bedrock-runtime"
server/integrations/microsoft/utils.ts (6)

42-54: Throw a typed error for deleteDocument to align with error taxonomy.

Use DeleteDocumentError (as used in Google integration) for consistency and observability.

Apply this diff:

+import { DeleteDocumentError } from "@/errors"
...
   } catch (err) {
-    Logger.error(
-      err,
-      `Error deleting file at ${filePath}: ${err} ${(err as Error).stack}`,
-      err,
-    )
-    throw new Error(`Error deleting file: ${(err as Error).message}`)
+    Logger.error(
+      err,
+      `Error deleting file at ${filePath}: ${err} ${(err as Error).stack}`,
+      err,
+    )
+    throw new DeleteDocumentError({
+      message: "Error deleting file",
+      cause: err as Error,
+      integration: Apps.MicrosoftDrive,
+      entity: DriveEntity.PDF,
+    })
   }

398-464: PDF processing: add size guard, fill owner/permissions, and use Logger.

  • Skip oversized files before downloading.
  • Populate owner and permissions (OneDrive vs SharePoint).
  • Use Logger instead of console.error.

Apply this diff:

   for (const file of pdfFiles) {
     try {
+      // Skip large files
+      if (isFileTooLarge(file.size)) {
+        Logger.warn(
+          `Skipping PDF ${file.name} as it's larger than ${MAX_ONEDRIVE_FILE_SIZE}MB`,
+        )
+        continue
+      }
       // Download PDF content
       const pdfBuffer = await downloadFileFromGraph(
         graphClient,
         file.id!,
         file.parentReference?.driveId!,
       )
@@
       // Create Vespa document structure
-      results.push({
+      const owner =
+        file.createdBy?.user?.displayName ??
+        file.lastModifiedBy?.user?.displayName ??
+        userEmail
+      const permissions = file.parentReference?.driveId
+        ? await getFilePermissionsSharepoint(
+            graphClient,
+            file.id!,
+            file.parentReference.driveId!,
+          )
+        : await getFilePermissions(graphClient, file.id!)
+
+      results.push({
         title: file.name!,
         url: file.webUrl ?? "",
         app: Apps.MicrosoftDrive,
         docId: file.id!,
         parentId: file.parentReference?.id ?? null,
-        owner: "", // Extract from file.createdBy if available
+        owner,
         photoLink: "",
         ownerEmail: userEmail,
         entity: DriveEntity.PDF,
-        chunks: chunks.map((c: any) => c.chunk),
-        permissions: [], // Process file.permissions if available
+        chunks: chunks.map((c: any) => c.chunk),
+        permissions,
         mimeType: file.file?.mimeType ?? "",
@@
-    } catch (error) {
-      console.error(`Error processing PDF ${file.name}:`, error)
+    } catch (error) {
+      Logger.error(error, `Error processing PDF ${file.name}`)
       continue
     }
   }

467-517: Word processing: robust app detection, size guard, owner/permissions, Logger.

  • Don’t infer app from refreshToken; detect SharePoint via parentReference.driveId.
  • Add size guard and fill owner/permissions, unify Logger.

Apply this diff:

   for (const file of wordFiles) {
     try {
+      if (isFileTooLarge(file.size)) {
+        Logger.warn(
+          `Skipping DOCX ${file.name} as it's larger than ${MAX_ONEDRIVE_FILE_SIZE}MB`,
+        )
+        continue
+      }
       // Download DOCX content
@@
-      results.push({
+      const app =
+        file.parentReference?.driveId ? Apps.MicrosoftSharepoint : Apps.MicrosoftDrive
+      const owner =
+        file.createdBy?.user?.displayName ??
+        file.lastModifiedBy?.user?.displayName ??
+        userEmail
+      const permissions = file.parentReference?.driveId
+        ? await getFilePermissionsSharepoint(
+            graphClient,
+            file.id!,
+            file.parentReference.driveId!,
+          )
+        : await getFilePermissions(graphClient, file.id!)
+
+      results.push({
         title: file.name!,
         url: file.webUrl ?? "",
-        app: graphClient.refreshToken
-          ? Apps.MicrosoftDrive
-          : Apps.MicrosoftSharepoint,
+        app,
         docId: file.id!,
         parentId: file.parentReference?.id ?? null,
-        owner: "",
+        owner,
         photoLink: "",
         ownerEmail: userEmail,
         entity: DriveEntity.WordDocument, // Reuse Google's entity types
         chunks: extractedContent.text_chunks || [], // Use text_chunks property
-        permissions: [],
+        permissions,
         mimeType: file.file?.mimeType ?? "",
@@
-    } catch (error) {
-      console.error(`Error processing Word document ${file.name}:`, error)
+    } catch (error) {
+      Logger.error(error, `Error processing Word document ${file.name}`)
       continue
     }
   }

520-602: Consistent logging: replace console.error with Logger in Excel processing.

Stay consistent with module logging.

Apply this diff:

-          console.error(
-            `Error processing worksheet ${worksheet.name}:`,
-            worksheetError,
-          )
+          Logger.error(worksheetError, `Error processing worksheet ${worksheet.name}`)
@@
-    } catch (error) {
-      console.error(`Error processing Excel file ${file.name}:`, error)
+    } catch (error) {
+      Logger.error(error, `Error processing Excel file ${file.name}`)
       continue
     }

604-647: Consistent logging in processFileContent catch.

Use Logger to capture stack and context.

Apply this diff:

   } catch (error) {
-    console.error(`Error processing file content for ${file.name}:`, error)
+    Logger.error(error, `Error processing file content for ${file.name}`)
     return [] // Fallback to metadata only on error
   }

664-671: Normalize lastSyncedAt to ISO string for storage consistency.

Elsewhere you store ISO strings; keeping types consistent reduces parsing issues.

Apply this diff:

   return {
     type: "microsoftDriveDeltaToken" as const,
     driveToken: "", // Will be set after first sync
     contactsToken: "", // For future contacts sync
-    lastSyncedAt: new Date(),
+    lastSyncedAt: new Date().toISOString(),
   }
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between f47d2a4 and b1ebdb8.

📒 Files selected for processing (6)
  • server/integrations/microsoft/client.ts (7 hunks)
  • server/integrations/microsoft/index.ts (3 hunks)
  • server/integrations/microsoft/sharepoint/index.ts (1 hunks)
  • server/integrations/microsoft/sync.ts (14 hunks)
  • server/integrations/microsoft/utils.ts (3 hunks)
  • server/package.json (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (1)
  • server/package.json
🧰 Additional context used
🧠 Learnings (2)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

  • server/integrations/microsoft/sync.ts
  • server/integrations/microsoft/index.ts
  • server/integrations/microsoft/utils.ts
  • server/integrations/microsoft/sharepoint/index.ts
📚 Learning: 2025-06-16T11:56:22.752Z
Learnt from: oindrila-b
PR: xynehq/xyne#545
File: server/integrations/google/index.ts:137-145
Timestamp: 2025-06-16T11:56:22.752Z
Learning: In server/integrations/google/index.ts, both Logger (base logger) and loggerWithChild (factory for email-scoped child loggers) are intentionally maintained to handle different logging scenarios - one for when email context is not available and one for when it is available.

Applied to files:

  • server/integrations/microsoft/utils.ts
🧬 Code graph analysis (5)
server/integrations/microsoft/sync.ts (9)
server/integrations/microsoft/client.ts (4)
  • makeGraphApiCall (97-111)
  • downloadFileFromGraph (192-208)
  • MicrosoftGraphClient (22-36)
  • createMicrosoftGraphClient (59-94)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (11-46)
  • discoverSiteDrives (49-109)
  • processSiteDrives (112-251)
server/integrations/microsoft/utils.ts (3)
  • loggerWithChild (23-25)
  • getFilePermissionsSharepoint (281-321)
  • processFileContent (604-647)
server/utils.ts (1)
  • getErrorMessage (103-106)
server/search/vespa.ts (3)
  • getDocumentOrNull (65-65)
  • DeleteDocument (67-67)
  • insertWithRetry (212-212)
server/db/syncJob.ts (2)
  • getAppSyncJobs (35-45)
  • updateSyncJob (65-87)
server/db/connector.ts (2)
  • db (3-3)
  • getMicrosoftAuthConnectorWithCredentials (306-367)
server/types.ts (1)
  • MicrosoftServiceCredentials (461-468)
server/db/syncHistory.ts (1)
  • insertSyncHistory (10-30)
server/integrations/microsoft/client.ts (1)
server/shared/types.ts (1)
  • Apps (38-38)
server/integrations/microsoft/index.ts (7)
server/db/schema/connectors.ts (2)
  • SelectConnector (146-146)
  • connectors (60-109)
server/integrations/microsoft/utils.ts (1)
  • loggerWithChild (23-25)
server/integrations/tracker.ts (1)
  • Tracker (89-325)
server/types.ts (1)
  • MicrosoftServiceCredentials (461-468)
server/integrations/microsoft/client.ts (1)
  • createMicrosoftGraphClient (59-94)
server/integrations/metricStream.ts (2)
  • sendWebsocketMessage (11-16)
  • closeWs (5-7)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (11-46)
  • discoverSiteDrives (49-109)
  • processSiteDrives (112-251)
server/integrations/microsoft/utils.ts (6)
server/integrations/google/index.ts (4)
  • loggerWithChild (147-149)
  • downloadDir (2431-2431)
  • deleteDocument (1467-1484)
  • safeLoadPDF (2479-2495)
server/logger/index.ts (3)
  • getLoggerWithChild (192-200)
  • Subsystem (15-15)
  • getLogger (36-93)
server/integrations/microsoft/client.ts (4)
  • MicrosoftGraphClient (22-36)
  • makeGraphApiCall (97-111)
  • downloadFileFromGraph (192-208)
  • Client (248-248)
server/utils.ts (1)
  • hashPdfFilename (328-335)
server/chunks.ts (1)
  • chunkDocument (123-133)
server/docxChunks.ts (1)
  • extractTextAndImagesWithChunksFromDocx (2412-2765)
server/integrations/microsoft/sharepoint/index.ts (4)
server/integrations/microsoft/client.ts (2)
  • MicrosoftGraphClient (22-36)
  • makeGraphApiCall (97-111)
server/integrations/microsoft/utils.ts (4)
  • loggerWithChild (23-25)
  • getFilePermissionsSharepoint (281-321)
  • getEntityFromMimeType (217-223)
  • processFileContent (604-647)
server/integrations/tracker.ts (1)
  • Tracker (89-325)
server/search/vespa.ts (1)
  • insertWithRetry (212-212)
🔇 Additional comments (7)
server/integrations/microsoft/client.ts (1)

38-56: Propagate refreshed expiry and keep client state consistent

Add expiresAt to the updater and assign it to the instance. This also addresses the previously raised token-expiry propagation concern.

 export const updateMicrosoftGraphClient = (
   graphClient: MicrosoftGraphClient,
   accessToken: string,
-  refreshToken?: string,
+  refreshToken?: string,
+  expiresAt?: Date,
 ) => {
   graphClient.accessToken = accessToken
   if (refreshToken) graphClient.refreshToken = refreshToken
+  if (expiresAt) graphClient.expiresAt = expiresAt

   const authProvider = new CustomAuthProvider(accessToken)

   graphClient.client = Client.initWithMiddleware({
     authProvider,
     defaultVersion: "v1.0",
   })
   graphClient.betaClient = Client.initWithMiddleware({
     authProvider,
     defaultVersion: "beta",
   })
 }
server/integrations/microsoft/sharepoint/index.ts (1)

174-184: Remove @microsoft.graph.downloadUrl from persisted metadata

This URL is short‑lived and pre‑authorized; persisting increases leakage risk without value.

                   metadata: JSON.stringify({
                     size: item.size,
-                    downloadUrl: item["@microsoft.graph.downloadUrl"],
                     siteId: siteDrive.sharePointIds?.siteId,
                     driveId: siteDrive.id,
                     driveName: siteDrive.name,
                     driveType: siteDrive.driveType,
                     parentId: item.parentReference?.id ?? "",
                     parentPath: item.parentReference?.path ?? "/",
                     eTag: item.eTag ?? "",
                   }),
server/integrations/microsoft/sync.ts (3)

1298-1300: Guard substring on possibly empty/undefined nextLink

Avoid runtime error when nextLink is falsy.

-    loggerWithChild({ email }).info(
-      `Fetching delta changes from: ${nextLink.substring(0, 100)}...`,
-    )
+    loggerWithChild({ email }).info(
+      `Fetching delta changes from: ${String(nextLink ?? "").substring(0, 100)}...`,
+    )

1415-1422: Do not persist @microsoft.graph.downloadUrl in metadata

Security/privacy risk; short‑lived pre‑authorized URL should not be stored.

       metadata: JSON.stringify({
         size: item.size,
-        downloadUrl: item["@microsoft.graph.downloadUrl"],
         siteId: siteId,
         driveId: driveId,
         parentId: item.parentReference?.id ?? "",
         parentPath: item.parentReference?.path ?? "/",
         eTag: item.eTag ?? "",
       }),

1150-1192: Filter out drives with missing siteId before initial ingestion

Prevents keys like "undefined::driveId" and ingestion of malformed entries.

-  //Filters out newly added drives
-  drives = drives.filter(
-    (drive) =>
-      drive.id &&
-      !(`${drive.sharePointIds?.siteId}::${drive.id}` in deltaLinks),
-  )
+  // Filters out newly added drives; ensure siteId exists
+  drives = drives.filter(
+    (drive) =>
+      Boolean(drive.id) &&
+      Boolean(drive.sharePointIds?.siteId) &&
+      !(`${drive.sharePointIds!.siteId}::${drive.id}` in deltaLinks),
+  )
server/integrations/microsoft/utils.ts (2)

29-39: Ensure download directory in all environments (not only non-prod).

This was flagged earlier; current logic only creates the directory in non-production, causing failures in prod.

Apply this diff:

-// Download directory setup
-export const downloadDir = path.resolve(__dirname, "../../downloads")
-
-if (process.env.NODE_ENV !== "production") {
-  const init = () => {
-    if (!fs.existsSync(downloadDir)) {
-      fs.mkdirSync(downloadDir, { recursive: true })
-    }
-  }
-  init()
-}
+// Download directory setup
+export const downloadDir =
+  process.env.DOWNLOAD_DIR
+    ? path.resolve(process.env.DOWNLOAD_DIR)
+    : path.resolve(__dirname, "../../downloads")
+
+try {
+  if (!fs.existsSync(downloadDir)) {
+    fs.mkdirSync(downloadDir, { recursive: true })
+  }
+} catch (e) {
+  const fallback = path.resolve(require("os").tmpdir(), "xyne_downloads")
+  if (!fs.existsSync(fallback)) fs.mkdirSync(fallback, { recursive: true })
+  // eslint-disable-next-line @typescript-eslint/ban-ts-comment
+  // @ts-ignore reassign to known-good path
+  ;(exports as any).downloadDir = fallback
+  Logger.warn({ fallback }, "Falling back to tmp download dir")
+}

102-133: Comment inaccurate: CustomServiceAuthProvider is used in server/utils.ts, server/db/connector.ts, and server/api/admin.ts, so it’s not unused.

Likely an incorrect or invalid review comment.

devesh-juspay
devesh-juspay previously approved these changes Sep 29, 2025
devesh-juspay
devesh-juspay previously approved these changes Sep 29, 2025
Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
server/integrations/microsoft/index.ts (1)

953-958: Swap refreshToken and clientId/clientSecret arguments in OAuth flow

Apply this in server/integrations/microsoft/index.ts:

-    const graphClient = createMicrosoftGraphClient(
-      oauthTokens.access_token,
-      oauthTokens.refresh_token,
-      microsoftProvider.clientId!,
-      microsoftProvider.clientSecret! as string,
-    )
+    const graphClient = createMicrosoftGraphClient(
+      oauthTokens.access_token,
+      microsoftProvider.clientId!,
+      microsoftProvider.clientSecret! as string,
+      oauthTokens.refresh_token,
+      microsoftProvider.tenantId, // include if available
+    )
🧹 Nitpick comments (3)
server/integrations/microsoft/index.ts (3)

828-834: Pass token expiry when constructing service-account Graph client (optional but helpful).

Improves observability/refresh logic by preserving expiresAt.

     const graphClient = createMicrosoftGraphClient(
       credentials.access_token,
       credentials.clientId,
       credentials.clientSecret,
       undefined,
       credentials.tenantId,
+      credentials.expires_at ? new Date(credentials.expires_at) : undefined,
     )

6-6: Remove unused imports.

Keep the module light and lint‑clean.

-import {
-  Subsystem,
-  SyncCron,
-  type MicrosoftServiceCredentials,
-  type OAuthCredentials,
-  type SaaSJob,
-  type SaaSOAuthJob,
-} from "@/types"
+import {
+  Subsystem,
+  SyncCron,
+  type MicrosoftServiceCredentials,
+  type OAuthCredentials,
+  type SaaSOAuthJob,
+} from "@/types"
@@
-import { getConnector, getOAuthConnectorWithCredentials } from "@/db/connector"
+import { getOAuthConnectorWithCredentials } from "@/db/connector"
@@
-import { Client } from "@microsoft/microsoft-graph-client"
+// (removed) unused
@@
-import type { InvokeModelResponseFilterSensitiveLog } from "@aws-sdk/client-bedrock-runtime"
+// (removed) unused

If your linting still flags others (e.g., DriveItem), we can prune them too.

Also applies to: 9-9, 68-68, 73-73


848-849: Nit: prefer const for sites.

It isn’t reassigned.

-    let sites = await discoverSharePointSites(graphClient, email!)
+    const sites = await discoverSharePointSites(graphClient, email!)
📜 Review details

Configuration used: CodeRabbit UI

Review profile: CHILL

Plan: Pro

📥 Commits

Reviewing files that changed from the base of the PR and between 0738662 and f0d8916.

📒 Files selected for processing (3)
  • server/integrations/microsoft/index.ts (3 hunks)
  • server/package.json (1 hunks)
  • server/server.ts (3 hunks)
🚧 Files skipped from review as they are similar to previous changes (2)
  • server/server.ts
  • server/package.json
🧰 Additional context used
🧠 Learnings (1)
📚 Learning: 2025-05-28T10:47:41.020Z
Learnt from: naSim087
PR: xynehq/xyne#484
File: server/integrations/google/sync.ts:222-222
Timestamp: 2025-05-28T10:47:41.020Z
Learning: The functions `handleGoogleDriveChange` and `getDriveChanges` in `server/integrations/google/sync.ts` are intentionally exported for future changes, even though they are not currently being imported by other modules.

Applied to files:

  • server/integrations/microsoft/index.ts
🧬 Code graph analysis (1)
server/integrations/microsoft/index.ts (8)
server/db/schema/connectors.ts (2)
  • SelectConnector (146-146)
  • connectors (60-109)
server/integrations/microsoft/utils.ts (1)
  • loggerWithChild (23-25)
server/integrations/tracker.ts (1)
  • Tracker (89-325)
server/types.ts (1)
  • MicrosoftServiceCredentials (461-468)
server/integrations/microsoft/client.ts (1)
  • createMicrosoftGraphClient (59-94)
server/integrations/metricStream.ts (2)
  • sendWebsocketMessage (11-16)
  • closeWs (5-7)
server/integrations/microsoft/sharepoint/index.ts (3)
  • discoverSharePointSites (11-46)
  • discoverSiteDrives (49-109)
  • processSiteDrives (112-251)
server/errors/index.ts (1)
  • CouldNotFinishJobSuccessfully (194-203)
🔇 Additional comments (1)
server/integrations/microsoft/index.ts (1)

75-79: LGTM: SharePoint discovery/processing imports.

Imports align with the new service-account SharePoint flow.

@devesh-juspay devesh-juspay merged commit fe95f0b into main Sep 29, 2025
4 checks passed
@devesh-juspay devesh-juspay deleted the feat/microsoft-sharepoint branch September 29, 2025 09:51
@coderabbitai coderabbitai bot mentioned this pull request Oct 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants